Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>First, I'll just share some verbiage I noticed on the Netflix site under <strong>Limitations on Use</strong>:</p> <p><em>Any unauthorized use of the Netflix service or its contents will terminate the limited license granted by us and will result in the cancellation of your membership.</em></p> <p>In short, I'm not sure what your script does after this, but some activities could jeopardize your relationship with Netflix. I did not read the whole ToS, but you should.</p> <p>That said, there are plenty of legitimate reasons to scrape html information, and I do it all the time. So my first bet with this specific problem is you're using the wrong detection string... Just send a bogus email/password and print the response... Perhaps you made an assumption about what it looks like when you log in with a browser, but the browser is sending info that gets further into the process.</p> <p>I wish I could offer specifics on what to do next, but I would rather not risk my relationship with 'flix to give a better answer to the question... so I'll just share a few observations I gleaned from scraping oodles of other websites that made it kindof hard to use web robots...</p> <p>First, login to your account with Firefox, and be sure to have the <a href="https://addons.mozilla.org/en-us/firefox/addon/live-http-headers/" rel="nofollow">Live HTTP Headers</a> add-on enabled and in capture mode... what you will see when you login live is <em>invaluable</em> to your scripting efforts... for instance, this was from a session while I logged in...</p> <pre> POST /Login HTTP/1.1 Host: signup.netflix.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.16) Gecko/20110319 Firefox/3.6.16 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 115 Connection: keep-alive Referer: https://signup.netflix.com/Login?country=1&rdirfdc=true --->Insert lots of private stuff here Content-Type: application/x-www-form-urlencoded Content-Length: 168 authURL=sOmELoNgTeXtStRiNg&nextpage=&SubmitButton=true&country=1&email=EmAiLAdDrEsS%40sOmEMaIlProvider.com&password=UnEnCoDeDpAsSwOrD </pre> <p>Pay particular attention to the stuff below "Content-Length" field and <em>all</em> the parameters that come after it.</p> <p>Now log back out, and pull up the login site page again... chances are, you will see some of those fields hidden as state information in <code>&lt;input type="hidden"&gt;</code> tags... some web apps keep state by feeding you fields and then they use javascript to resubmit that same information in your login POST. I usually use lxml to parse the pages I receive... if you try it, keep in mind that lxml prefers utf-8, so I include code that automagically converts when it sees other encodings...</p> <pre><code> response = urlopen(req,data) # info is from the HTTP headers... like server version info = response.info().dict # page is the HTML response page = response.read() encoding = chardet.detect(page)['encoding'] if encoding != 'utf-8': page = page.decode(encoding, 'replace').encode('utf-8') </code></pre> <p>BTW, <a href="http://www.voidspace.org.uk/python/articles/urllib2.shtml" rel="nofollow">Michael Foord</a> has a very good reference on urllib2 and many of the assorted issues.</p> <p>So, <strong>in summary</strong>:</p> <ol> <li>Using your existing script, dump the results from a known bogus login to be sure you're parsing for the right info... I'm <em>pretty sure</em> you made a bad assumption above</li> <li>It also looks like you aren't submitting enough parameters in the POST. Experience tells me you need to set <code>authURL</code> in addition to <code>email</code> and <code>password</code>... if possible, I try to mimic what the browser sends...</li> <li>Occasionally, it matters whether you have set your user-agent string and referring webpage. I always set these when I scrape so I don't waste cycles debugging.</li> <li>When all else fails, look at info stored in cookies they send</li> <li>Sometimes websites base64 encode form submission data. I don't know whether Netflix does</li> <li>Some websites are very protective of their intellectual property, and programatically reading/archiving the information is considered a theft of their IP. Again, read the ToS... I don't know how Netflix views what you want to do.</li> <li>I am providing this for informational purposes and under no circumstances endorse, or condone the violation of Netflix terms of service... nor can I confirm whether your proposed activity would... I'm just saying it might :-). Talk to a lawyer that specializes in e-discovery if you need an official ruling. Feet first. Don't eat yellow snow... etc...</li> </ol>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload