Note that there are some explanatory texts on larger screens.

plurals
  1. POPython 3, Web-scraping, and Javascript [Oh My]
    primarykey
    data
    text
    <p>I have come to the point of entering the melee on web-scraping webpages using Javascript, with Python3. I am well aware that my boot may be making contact with a dead horse, but I feel like drawing my six-shooter anyway. It's a spaghetti western; be my gray hat?</p> <p><strong>::Backstory::</strong></p> <p>I am using Python 3.2.3.</p> <p>I am interested in gathering historical stock//etf//mutual_fund price data for YTD, 1-yr, 3-yr, 5-yr 10-yr... and/or similar timeframes for a user-defined stock, etf, or mutual fund. I set my sites on Morningstar.com, as they tend to provide as much data as possible without necessarily requiring a log-in; other folks such as finance.google.com &amp;c tend to be inconsistent in what data they provide regarding stocks vs etfs vs mutual funds.</p> <p>The trade-off in using Morningstar for this historical data, or "Trailing Total Returns" as they call it, is that for producing this data they use Javascript.</p> <p>Here are some example links from Morningstar:</p> <p><a href="http://performance.morningstar.com/fund/performance-return.action?t=VHCOX" rel="nofollow">A Mutual Fund;</a></p> <p><a href="http://performance.morningstar.com/funds/etf/total-returns.action?t=VAW" rel="nofollow">An ETF;</a></p> <p><a href="http://performance.morningstar.com/stock/performance-return.action?t=INTC" rel="nofollow">A Stock.</a></p> <p>I am interested in the "Trailing Returns" portion, top row or so of numbers in the Javascript-produced chart.</p> <p><strong>::Attempted So Far::</strong></p> <p>I've confirmed that wget doesn't play with Javascript; even downloading all of the associated files [css, .js, &amp;c] hasn't allowed me to locally render the javascript in browser or in script. Research here on StackOverflow confirmed this. Am willing to be corrected here.</p> <p>My research informed me that Mechanize doesn't exist for Python3. I tried anyway, and turned into Policeman Javert crying out "I knew it!" at the error message "module does not exist".</p> <p><strong>::I've Heard Of...::</strong></p> <p>->Selenium. However, my understanding is that this requires Thy Favorite Browser to actually open up a webpage, navigate around, and then not close because there's no "close this tab//window" command//option for Selenium. What if I//my_user want to get historical data for many etfs, stocks, and/or mutual funds? That's a lot of tabs//windows opening up in a browser which was not necessarily desired to be opened.</p> <p>->httplib2. I think this is nice, but I'm doubtful if it will play with Javascript. Does it, for example using the .cache and get options?</p> <pre><code>import httplib2 conn = httplib2.Http(".cache") page = conn.request(u"http://the_url","GET") </code></pre> <p>->Windmill. See 'Selenium'. I am, however, off-key enough to sing 'Man of La Mancha'.</p> <p>->Google's <a href="http://code.google.com/p/webscraping/" rel="nofollow">webscraping</a> code. Would an attempt at downloading a Javascript-laden page result in ... positive results?</p> <p>I've read chatter about having to "emulating a browser without a browser". Sounds like Mechanize, but not for Python3 as I currently understand.</p> <p><strong>::My Question::</strong></p> <p>Any suggestions, pointers, solutions, or "look over here" directions?</p> <p>Many thanks,</p> <p>Miles, Dusty Desert Villager.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload