Note that there are some explanatory texts on larger screens.

plurals
  1. POHTMLParser or urllib2 unicode issue
    primarykey
    data
    text
    <p>I am trying to use HTMLParser and urllib2 to get to an image file</p> <pre><code>content = urllib2.urlopen( imgurl.encode('utf-8') ).read() try: p = MyHTMLParser( ) p.feed( content ) p.download_file( ) p.close() except Exception,e: print e </code></pre> <p>MyHTMLParser:</p> <pre><code>class MyHTMLParser(HTMLParser): def __init__(self): HTMLParser.__init__(self) self.url="" self.outfile = "some.png" def download_file(self): urllib.urlretrieve( self.url, self.outfile ) def handle_starttag(self, tag, attrs): if tag == "a": # after some manipulation here, self.url will have a img url self.url = "http://somewhere.com/Fondue%C3%A0.png" </code></pre> <p>when i run the script, i get</p> <pre><code>Traceback (most recent call last): File "test.py", line 59, in &lt;module&gt; p.feed( data ) File "/usr/lib/python2.7/HTMLParser.py", line 114, in feed self.goahead(0) File "/usr/lib/python2.7/HTMLParser.py", line 158, in goahead k = self.parse_starttag(i) File "/usr/lib/python2.7/HTMLParser.py", line 305, in parse_starttag attrvalue = self.unescape(attrvalue) File "/usr/lib/python2.7/HTMLParser.py", line 472, in unescape return re.sub(r"&amp;(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s) File "/usr/lib/python2.7/re.py", line 151, in sub return _compile(pattern, flags).sub(repl, string, count) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 56: ordinal not in range(128) </code></pre> <p>Using the suggestions i found in the found, i did the .encode('utf-8') method, but it still gives me error. how to fix this ? thanks</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload