Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Inreresting the problem isn't a redirect is that page modifies the content using javascript, but <code>urllib2</code> doesn't have a <code>JS</code> engine it just <code>GETS</code> data, if you disabled javascript on your browser you will note it loads basically the same content as what <code>urllib2</code> returns </p> <pre><code>import urllib2 from BeautifulSoup import BeautifulSoup bostonPage = urllib2.urlopen("http://www.tripadvisor.com/HACSearch?geo=34438#02,1342106684473,rad:S0,sponsors:ABEST_WESTERN,style:Szff_6") soup = BeautifulSoup(bostonPage) open('test.html', 'w').write(soup.read()) </code></pre> <p><code>test.html</code> and disabling JS in your browser, easiest in firefox content -> uncheck enable javascript, generates identical result sets.</p> <p>So what can we do well, first we should check if the site offers an API, scrapping tends to be frown up <a href="http://www.tripadvisor.com/help/what_type_of_tripadvisor_content_is_available" rel="nofollow noreferrer">http://www.tripadvisor.com/help/what_type_of_tripadvisor_content_is_available</a></p> <p><a href="https://stackoverflow.com/questions/1185961/travel-hotel-apis">Travel/Hotel API&#39;s?</a> it looks they might, though with some restrictions.</p> <p>But if we still need to scrape it, with JS, then we can use <code>selenium</code> <a href="http://seleniumhq.org/" rel="nofollow noreferrer">http://seleniumhq.org/</a> its mainly used for testing, but its easy and has fairly good docs.</p> <p>I also found this <a href="https://stackoverflow.com/questions/3362859/scraping-websites-with-javascript-enabled">Scraping websites with Javascript enabled?</a> and this <a href="http://grep.codeconsult.ch/2007/02/24/crowbar-scrape-javascript-generated-pages-via-gecko-and-rest/" rel="nofollow noreferrer">http://grep.codeconsult.ch/2007/02/24/crowbar-scrape-javascript-generated-pages-via-gecko-and-rest/</a> </p> <p>hope that helps.</p> <p>As a side note:</p> <pre><code>&gt;&gt;&gt; import urllib2 &gt;&gt;&gt; from bs4 import BeautifulSoup &gt;&gt;&gt; &gt;&gt;&gt; bostonPage = urllib2.urlopen("http://www.tripadvisor.com/HACSearch?geo=34438#02,1342106684473,rad:S0,sponsors:ABEST_WESTERN,style:Szff_6") &gt;&gt;&gt; value = bostonPage.read() &gt;&gt;&gt; soup = BeautifulSoup(value) &gt;&gt;&gt; open('test.html', 'w').write(value) </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload