Note that there are some explanatory texts on larger screens.

plurals
  1. POBeautifulSoup and Amazon.co.uk
    primarykey
    data
    text
    <p>I am trying to parse amazon to compile a list of prices, as part of a bigger project relating to statistics. However, I am stumped. I was wondering If anyone can review my code and tell me where I went wrong?</p> <pre><code>#!/usr/bin/python # -*- coding: utf-8 -*- import mechanize from bs4 import BeautifulSoup URL_00 = "http://www.amazon.co.uk/Call-Duty-Black-Ops-PS3/dp/B007WPF7FE/ref=sr_1_2?ie=UTF8&amp;qid=1352117194&amp;sr=8-2" bro = mechanize.Browser() resp = bro.open(URL_00) html = resp.get_data() soup_00 = BeautifulSoup(html) price = soup_00.find('b', {'class':'priceLarge'}) print price #this should return at the very least the text enclosed in a tag </code></pre> <p>According to the screenshot, what I wrote above should work, shouldn't it?</p> <p><img src="https://i.stack.imgur.com/SXRUt.png" alt="http://i.imgur.com/bPVe1.png (cannot post an image as a newbie..)"></p> <p>Well all I get in the print out is "[]", if I change the line before last to this:</p> <pre><code> price = soup_00.find('b', {'class':'priceLarge'}).contents[0].string </code></pre> <p>or</p> <pre><code>price = soup_00.find('b', {'class':'priceLarge'}).text </code></pre> <p>I get a "noneType" error.</p> <p>I am quite confused as to why this is happening. The page encoding in the URL on chrome says UTF8, to which my script is adjusted in line #2. I have changed it to ISO (as per inner HTML of the page) but this makes zero difference, so I am positive encoding is not the issue here.</p> <p>Also, don't know if this is relevant at all, but my system locale on linux being UTF-8 should not cause a problem should it?</p> <p>Any ideas would be welcome.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload