Note that there are some explanatory texts on larger screens.

plurals
  1. POBeautiful Soup decode error
    primarykey
    data
    text
    <p>I am working in a job and I need to parse a site with Beautiful Soup. The site is <a href="http://www.manta.com" rel="nofollow">http://www.manta.com</a> but when I try to see the encoding of the site in the meta of the HTML code don't appears nothing. I'm try to parse the HTML locally , with the web page downloaded, but I'm having trouble with some decoding errors: </p> <pre class="lang-py prettyprint-override"><code># manta web page downloaded before html = open('1.html', 'r') soup = BeautifulSoup(html, 'lxml') </code></pre> <p>This produces the following stack trace: </p> <pre class="lang-py prettyprint-override"><code>Traceback (most recent call last): File "E:/Projects/Python/webkit/sample.py", line 10, in &lt;module&gt; soup = BeautifulSoup(html, 'lxml') File "C:\Python27\lib\site-packages\bs4\__init__.py", line 172, in __init__ self._feed() File "C:\Python27\lib\site-packages\bs4\__init__.py", line 185, in _feed self.builder.feed(self.markup) File "C:\Python27\lib\site-packages\bs4\builder\_lxml.py", line 195, in feed self.parser.close() File "parser.pxi", line 1209, in lxml.etree._FeedParser.close(src\lxm\lxml.etree.c:90717) File "parsertarget.pxi", line 142, in lxml.etree._TargetParserContext._handleParseResult (src\lxml\lxml.etree.c:100104) File "parsertarget.pxi", line 130, in lxml.etree._TargetParserContext._handleParseResult (src\lxml\lxml.etree.c:99927) File "lxml.etree.pyx", line 294, in lxml.etree._ExceptionContext._raise_if_stored (src\lxml\lxml.etree.c:9387) File "saxparser.pxi", line 259, in lxml.etree._handleSaxData (src\lxml \lxml.etree.c:96065) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 105-106: invalid data </code></pre> <p>I'm try to introduce the encoding in the constructor of Beautiful Soup :</p> <pre class="lang-py prettyprint-override"><code>soup = BeautifulSoup(html, 'lxml', from_encoding= "some encoding") </code></pre> <p>And I continue get the same error.</p> <p>The interesting thing is that if I load the page in my browser and then I change the encode to utf-8 for example in Firefox and the save it. This work good.Any help is greatly appreciated. Thank you. </p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload