Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I speed up fetching pages with urllib2 in python?
    text
    copied!<p>I have a script that fetches several web pages and parses the info.</p> <p>(An example can be seen at <a href="http://bluedevilbooks.com/search/?DEPT=MATH&amp;CLASS=103&amp;SEC=01" rel="noreferrer">http://bluedevilbooks.com/search/?DEPT=MATH&amp;CLASS=103&amp;SEC=01</a> )</p> <p>I ran cProfile on it, and as I assumed, urlopen takes up a lot of time. Is there a way to fetch the pages faster? Or a way to fetch several pages at once? I'll do whatever is simplest, as I'm new to python and web developing.</p> <p>Thanks in advance! :)</p> <p>UPDATE: I have a function called <code>fetchURLs()</code>, which I use to make an array of the URLs I need so something like <code>urls = fetchURLS()</code>.The URLS are all XML files from Amazon and eBay APIs (which confuses me as to why it takes so long to load, maybe my webhost is slow?)</p> <p>What I need to do is load each URL, read each page, and send that data to another part of the script which will parse and display the data.</p> <p>Note that I can't do the latter part until ALL of the pages have been fetched, that's what my issue is.</p> <p>Also, my host limits me to 25 processes at a time, I believe, so whatever is easiest on the server would be nice :)</p> <hr> <p>Here it is for time:</p> <pre><code>Sun Aug 15 20:51:22 2010 prof 211352 function calls (209292 primitive calls) in 22.254 CPU seconds Ordered by: internal time List reduced from 404 to 10 due to restriction &lt;10&gt; ncalls tottime percall cumtime percall filename:lineno(function) 10 18.056 1.806 18.056 1.806 {_socket.getaddrinfo} 4991 2.730 0.001 2.730 0.001 {method 'recv' of '_socket.socket' objects} 10 0.490 0.049 0.490 0.049 {method 'connect' of '_socket.socket' objects} 2415 0.079 0.000 0.079 0.000 {method 'translate' of 'unicode' objects} 12 0.061 0.005 0.745 0.062 /usr/local/lib/python2.6/HTMLParser.py:132(goahead) 3428 0.060 0.000 0.202 0.000 /usr/local/lib/python2.6/site-packages/BeautifulSoup.py:1306(endData) 1698 0.055 0.000 0.068 0.000 /usr/local/lib/python2.6/site-packages/BeautifulSoup.py:1351(_smartPop) 4125 0.053 0.000 0.056 0.000 /usr/local/lib/python2.6/site-packages/BeautifulSoup.py:118(setup) 1698 0.042 0.000 0.358 0.000 /usr/local/lib/python2.6/HTMLParser.py:224(parse_starttag) 1698 0.042 0.000 0.275 0.000 /usr/local/lib/python2.6/site-packages/BeautifulSoup.py:1397(unknown_starttag) </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload