Note that there are some explanatory texts on larger screens.

plurals
  1. PORandom HTTP 503 Error using urllib and BeautifulSoup
    primarykey
    data
    text
    <p>I'm scraping a website with cookies. They provide multiple drop-down menus and I'm iterating through each option and re-capturing the session cookies with every request. The code runs just fine for a while, but I randomly get a 503 error.</p> <p>My code inserts data into a PostgreSQL database, and to help emphasize the randomness of this error I want to share that I've received the 503 after inserting as few as 1200 entries (rows) and as many as 4200. There doesn't seem to be any pattern to the raising of the this exception. I can't make sense of it.</p> <p>If it helps, here is a portion of my code:</p> <pre><code># -*- coding: utf-8 -*- import scrape_tools import psycopg2 import psycopg2.extras import urllib import urllib2 import json import cookielib import time tools = scrape_tools.tool_box() db = tools.db_connect() psycopg2.extras.register_hstore(db) cursor = db.cursor(cursor_factory = psycopg2.extras.RealDictCursor) cookiejar = cookielib.CookieJar() opener = urllib2.build_opener( urllib2.HTTPRedirectHandler(), urllib2.HTTPHandler(debuglevel=0), urllib2.HTTPSHandler(debuglevel=0), urllib2.HTTPCookieProcessor(cookiejar), ) url ='http://www.website.com/' soup = tools.request(url) type_select = soup('select',{'id':'type'}) for option_tag in type_select: select_option = option_tag('option') for option_contents in select_option: if 'Select' in option_contents.contents[0]: continue type = option_contents.contents[0] type_val = option_contents['value'] print 'Type', type get_more_url = 'http://www.website.com/' + type_val request2 = urllib2.Request(get_more_url) fp2 = opener.open(request2) html2_object = fp2.read() json_result = json.loads(html2_object) for json_dict in json_result: for json_key in json_dict: if len(json_key) == 0: continue more_data = json_dict[json_key] print ' ', more_data (---Out of courtesy, I'll stop here--) </code></pre> <p>(*Please note, <code>scrape_tools</code> is a custom module)</p> <p>Am I missing something with cookie storage? Am I missing something obvious? I can't seem to figure out why this is happening. I've 'googled', 'stackoverflowed', etc. for hours trying to find somebody having similar issues, but haven't found anything.</p> <p>I've also used selenium to scrape data in the past and have that in my pocket as a last resort, but this project is huge and I'd rather not have Firefox eating up memory on the server for a week.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload