Note that there are some explanatory texts on larger screens.

plurals
  1. POBeautifulSoup4 : Ampersand in text
    text
    copied!<p>I have a problem using BeautifulSoup4... (I'm quite a Python/BeautifulSoup newbie, so forgive me if i'm dumb)</p> <p>Why does the following code:</p> <pre><code>from bs4 import BeautifulSoup soup_ko = BeautifulSoup('&lt;select&gt;&lt;option&gt;foo&lt;/option&gt;&lt;option&gt;bar &amp; baz&lt;/option&gt;&lt;option&gt;qux&lt;/option&gt;&lt;/select&gt;') soup_ok = BeautifulSoup('&lt;select&gt;&lt;option&gt;foo&lt;/option&gt;&lt;option&gt;bar and baz&lt;/option&gt;&lt;option&gt;qux&lt;/option&gt;&lt;/select&gt;') print soup_ko.find_all('option') print soup_ok.find_all('option') </code></pre> <p>produce the following output:</p> <pre><code>[&lt;option&gt;foo&lt;/option&gt;, &lt;option&gt;bar &amp;amp; baz&lt;/option&gt;] [&lt;option&gt;foo&lt;/option&gt;, &lt;option&gt;bar and baz&lt;/option&gt;, &lt;option&gt;qux&lt;/option&gt;] </code></pre> <p>i was expecting the same result, an array of my 3 options... but BeautifulSoup seems to dislike the ampersand in the text? How can i get rid of this and get a correct array without editing my HTML (or by transforming/converting it)?</p> <p>thanks,</p> <p><strong>Edit:</strong> Seems like a 4.2.0 bug... i downloaded both 4.2.0 and 4.2.1 versions (from <a href="http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2.0.tar.gz" rel="nofollow">http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2.0.tar.gz</a> and <a href="http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2.1.tar.gz" rel="nofollow">http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2.1.tar.gz</a>), unzip it in my script folder, change my code to:</p> <pre><code>import sys sys.path.insert(0, "beautifulsoup4-" + sys.argv[1]) from bs4 import BeautifulSoup, __version__ print "Beautiful Soup %s" % __version__ soup_ko = BeautifulSoup('&lt;select&gt;&lt;option&gt;foo&lt;/option&gt;&lt;option&gt;bar &amp; baz&lt;/option&gt;&lt;option&gt;qux&lt;/option&gt;&lt;/select&gt;') print soup_ko.find_all('option') </code></pre> <p>and got the results:</p> <pre><code>15:24:38 pataluc ~ % python stack.py 4.2.0 Beautiful Soup 4.2.0 [&lt;option&gt;foo&lt;/option&gt;, &lt;option&gt;bar &amp;amp; baz&lt;/option&gt;] 15:24:41 pataluc ~ % python stack.py 4.2.1 Beautiful Soup 4.2.1 [&lt;option&gt;foo&lt;/option&gt;, &lt;option&gt;bar &amp;amp; baz&lt;/option&gt;, &lt;option&gt;qux&lt;/option&gt;] </code></pre> <p>so i guess my question is closed. thanks for your comments who made me realize it was a version issue.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload