Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>The mechanize page is not suggesting that you can emulate JavaScript in Python. It is saying that you can change a hidden field in a form, thus tricking the web server that a human<sup><a href="http://groups.google.com/group/nzpug/browse_thread/thread/a10538fb635d5786" rel="nofollow">1</a></sup> has selected the field. You still need to analyse the target yourself. </p> <p>There will be no Python-based solution to this problem, unless you wish to create a JavaScript interpreter in Python.</p> <p><a href="http://groups.google.com/group/nzpug/browse_thread/thread/a10538fb635d5786" rel="nofollow">My thoughts</a> on this problem have led me to three possible solutions:</p> <ol> <li>create an <a href="https://developer.mozilla.org/en/XULRunner" rel="nofollow">XULRunner</a> application</li> <li>browser automation</li> <li>attempt to interpret the client-side code</li> </ol> <p>Of those three, I've only really seen discussion of 2. I've seen something close to 1 in a commercial scraping application, where you basically create scripts by browsing on sites and selecting things on the pages that you would like the script to extract in the future.</p> <p>1 could possibly made to work with a Python script by accepting a serialisation (JSON ?) of <a href="http://www.wsgi.org/wsgi/What_is_WSGI" rel="nofollow">wsgi</a> Request objects, getting the app to fetch the URL, then sending the processed page as a wsgi Response object. You could possibly wrap some middleware around urllib2 to achieve this. Overkill probably, but kind of fun to think about.</p> <p>2 is usually achieved via <a href="http://seleniumhq.org/projects/remote-control/" rel="nofollow">Selenium RC</a> (Remote Control), a testing-centric tool. It provides a few methods like <code>getHtmlSource</code> but most people that I've heard using it get don't like its API.</p> <p>3 I have no idea about. <a href="http://nodejs.org/" rel="nofollow">node.js</a> is very hot right now, but I haven't touched it. I've never been able to build <a href="http://www.mozilla.org/js/spidermonkey/" rel="nofollow">spidermonkey</a> on my Ubuntu machine, so I haven't touched that either. My hunch is that in order to do this, you would provide the HTML source and your details to a JS interpreter, that would need to fake being your User-Agent etc in case the JavaScript wanted to reconnect with the server.</p> <p><sup><a href="http://groups.google.com/group/nzpug/browse_thread/thread/a10538fb635d5786" rel="nofollow">1</a></sup> well, more technically, a JavaScript compliant User-Agent, which is almost always a web browser used by a human</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload