Note that there are some explanatory texts on larger screens.

plurals
  1. POLogging in, navigating, and extracting text on SSL site using Python?
    primarykey
    data
    text
    <h2>Precursor: I asked a question similar to this yesterday <a href="https://stackoverflow.com/questions/18134834/extracting-and-parsing-html-from-a-secure-website-with-python">here</a>. My reason for not editing that question is that even though the two are similar, this one is far more advanced.</h2> <p><strong>My Project</strong>: Using Python, I want to logon to a secure website, navigate to several pages within that session and extract text from those pages into a file.</p> <p><strong>The Details</strong>: Here is all the information I have gathered/code I have written.</p> <p>Here are the portions of the secured site's logon page that are worth noting:</p> <pre><code>&lt;form action="index.asp" method="post" name="form"&gt; &lt;input type="text" id="user" name="user""&gt; &lt;input type="password" name="password"&gt; &lt;input type="hidden" name="logon" value="username"&gt; &lt;input type="submit" name="submit" value="Log In" class="button"&gt; &lt;/form&gt; </code></pre> <p>There is also javascript code on the page checking for cookies, so I know I'll need <code>cookielib.CookieJar()</code>.</p> <h2>BIG EDIT</h2> <p>I am importing the following modules: <code>urllib</code>, <code>urllib2</code>, <code>cookielib</code> and <code>nltk</code>.</p> <p>To produce the following code:</p> <pre><code>cookiejar = cookielib.CookieJar() # Notice I set 'debug' to 'true'. debug = True handlers = [ urllib2.HTTPHandler(debuglevel=debug), urllib2.HTTPSHandler(debuglevel=debug), urllib2.HTTPCookieProcessor(cookiejar), ] opener = urllib2.build_opener(*handlers) # These headers I copied directly from Chrome's Developer Tools opener.addheaders = [ ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"), ("Accept-Encoding", "gzip,deflate,sdch"), ("Accept-Language", "en-US,en;q=0.8"), ("Cache-Control", "max-age=0"), ("Connection", "keep-alive"), ("Content-Type", "application/x-www-form-urlencoded"), ("Host", "www.myebill.com"), ("Origin", "https://www.myebill.com"), ("Referer", "https://www.myebill.com/index.asp?startnam"), ("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36") ] urllib2.install_opener(opener) # Passing the form data as a URL-encoded string payload = "user=&lt;User&gt;&amp;password=&lt;Password&gt;&amp;logon=username&amp;submit=Log+In" req = urllib2.Request("https://www.myebill.com/index.asp", data=payload) cookiejar.add_cookie_header(req) page = urllib2.urlopen(req) pdata = page.read() print( nltk.clean_html( pdata ) ) </code></pre> <p><strong>NOTE</strong>: If you would like me to post the debug output, just ask. :)</p> <p><strong>My Problem</strong>: After running my code, I <em>still</em> get a "Your session has either timed out or you have not logged on correctly." message.</p> <p>Help please? I tried learning mechanize, but it seems the only documentation I can find online is convoluted and confusing. Any suggestions or code would be appreciated.</p> <p>Also, when I do find the answer, I promise to post my complete code as an edit to anyone who needs this as a reference! (omitting logon information, of course..)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload