StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POLogging in, navigating, and extracting text on SSL site using Python?
primarykey
Id
18150043
data
AcceptedAnswerId
18152763
AnswerCount
2
ClosedDate
CommentCount
4
CommunityOwnedDate
CreationDate
2013-08-09T15:14:18.090
FavoriteCount
0
LastActivityDate
2013-08-09T17:54:49.187
LastEditDate
2017-05-23T11:56:37.500
LastEditorUserId
-1
OwnerUserId
2646265
ParentId
0
PostTypeId
1
Score
0
ViewCount
597
LastEditorDisplayName
text
Body
<h2>Precursor: I asked a question similar to this yesterday <a href="https://stackoverflow.com/questions/18134834/extracting-and-parsing-html-from-a-secure-website-with-python">here</a>. My reason for not editing that question is that even though the two are similar, this one is far more advanced.</h2> <p><strong>My Project</strong>: Using Python, I want to logon to a secure website, navigate to several pages within that session and extract text from those pages into a file.</p> <p><strong>The Details</strong>: Here is all the information I have gathered/code I have written.</p> <p>Here are the portions of the secured site's logon page that are worth noting:</p> <pre><code><form action="index.asp" method="post" name="form"> <input type="text" id="user" name="user""> <input type="password" name="password"> <input type="hidden" name="logon" value="username"> <input type="submit" name="submit" value="Log In" class="button"> </form> </code></pre> <p>There is also javascript code on the page checking for cookies, so I know I'll need <code>cookielib.CookieJar()</code>.</p> <h2>BIG EDIT</h2> <p>I am importing the following modules: <code>urllib</code>, <code>urllib2</code>, <code>cookielib</code> and <code>nltk</code>.</p> <p>To produce the following code:</p> <pre><code>cookiejar = cookielib.CookieJar() # Notice I set 'debug' to 'true'. debug = True handlers = [ urllib2.HTTPHandler(debuglevel=debug), urllib2.HTTPSHandler(debuglevel=debug), urllib2.HTTPCookieProcessor(cookiejar), ] opener = urllib2.build_opener(*handlers) # These headers I copied directly from Chrome's Developer Tools opener.addheaders = [ ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"), ("Accept-Encoding", "gzip,deflate,sdch"), ("Accept-Language", "en-US,en;q=0.8"), ("Cache-Control", "max-age=0"), ("Connection", "keep-alive"), ("Content-Type", "application/x-www-form-urlencoded"), ("Host", "www.myebill.com"), ("Origin", "https://www.myebill.com"), ("Referer", "https://www.myebill.com/index.asp?startnam"), ("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36") ] urllib2.install_opener(opener) # Passing the form data as a URL-encoded string payload = "user=<User>&password=<Password>&logon=username&submit=Log+In" req = urllib2.Request("https://www.myebill.com/index.asp", data=payload) cookiejar.add_cookie_header(req) page = urllib2.urlopen(req) pdata = page.read() print( nltk.clean_html( pdata ) ) </code></pre> <p><strong>NOTE</strong>: If you would like me to post the debug output, just ask. :)</p> <p><strong>My Problem</strong>: After running my code, I <em>still</em> get a "Your session has either timed out or you have not logged on correctly." message.</p> <p>Help please? I tried learning mechanize, but it seems the only documentation I can find online is convoluted and confusing. Any suggestions or code would be appreciated.</p> <p>Also, when I do find the answer, I promise to post my complete code as an edit to anyone who needs this as a reference! (omitting logon information, of course..)</p>
Tags
<python><asp.net><authentication><extract><mechanize>
Title
Logging in, navigating, and extracting text on SSL site using Python?
singulars
PostAcceptedAnswerId
1. PO
  singulars
  PostTypePostTypeId
  PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USCommunity
UserOwnerUserId
1. USJacob Bridges
plurals
PostLinksPostIdRelatedPostId
1. PL
  singulars
  LinkTypeLinkTypeId
  LTLinked
2. PL
  singulars
  LinkTypeLinkTypeId
  LTLinked
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
  singulars
  PostTypePostTypeId
  PTAnswer
2. PO
  singulars
  PostTypePostTypeId
  PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.