Note that there are some explanatory texts on larger screens.

plurals
  1. POBeautiful Soup Page Source Error
    primarykey
    data
    text
    <p>I am trying to fetch the html source from this usl: <a href="http://books.google.com/books?id=NZlV0M5Ije4C&amp;dq=isbn:0470284889" rel="nofollow">http://books.google.com/books?id=NZlV0M5Ije4C&amp;dq=isbn:0470284889</a></p> <p>I used the following code:</p> <pre><code>#!/usr/bin/env python import urllib, urllib2, urlparse, argparse, re from bs4 import BeautifulSoup def getPageSoup(address): request = urllib2.Request(address, None, {'User-Agent':'Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.1)'} ) urlfile = urllib2.urlopen(request) page = urlfile.read() urlfile.close() print 'soup has been obtained!' return BeautifulSoup(page) soup2 = getPageSoup(address) metadata = soup2.findAll("metadata_row")#this content is present when viewing from the web browser </code></pre> <p>However, the html source from soup2 looks hardly like the source from the Google Books page:</p> <pre><code> &lt;!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"&gt; &lt;html&gt;&lt;head&gt;&lt;title&gt;Quantitative Trading: How to Build Your Own Algorithmic Trading Business - Ernie Chan - Google Books&lt;/title&gt;&lt;script&gt;(function(){function a(c){this.t={};this.tick=function(c,e,b){b=void 0!=b?b:(new Date).getTime();this.t[c]=[b,e]};this.tick("start",null,c)}var d=new a;window.jstiming={Timer:a,load:d};try{var f=null;window.chrome&amp;amp;&amp;amp;window.chrome.csi&amp;amp;&amp;amp;(f=Math.floor(window.chrome.csi().pageT));null==f&amp;amp;&amp;amp;window.gtbExternal&amp;amp;&amp;amp;(f=window.gtbExternal.pageT());null==f&amp;amp;&amp;amp;window.external&amp;amp;&amp;amp;(f=window.external.pageT);f&amp;amp;&amp;amp;(window.jstiming.pt=f)}catch(g){};})(); &lt;/script&gt;&lt;link href="/books/css/_9937a87cb2905e754d8d5e36995f224d/kl_about_this_book_kennedy_full_bundle.css" rel="stylesheet" type="text/css"/&gt;&lt;/head&gt;&lt;/html&gt; </code></pre> <p>HTML source from urllib2 and my web browser are very different. How can I get the correct page source?</p> <p>Thanks!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload