Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>First I can't reproduce the mac results using python2.7.1 and beautifulsoup4.3.2, that is I am getting the extra semicolon on all systems.</p> <p>The easy fix is a) use strictly valid HTML, or b) add a space after the ampersand. Chances are you can't change the source, and if you could parse out and replace these in python you wouldn't be needing BeautifulSoup ;)</p> <p>So the problem is that the BeautifulSoupHTMLParser first converts <code>S&amp;P500</code> to <code>S&amp;P500;</code> because it assumes <code>P500</code> is the character name and you just forgot the semicolon.</p> <p>Then later it reparses the string and finds <code>&amp;P500;</code>. Now it doesn't recognize <code>P500</code> as a valid name and converts the <code>&amp;</code> to <code>&amp;amp;</code> without touching the rest.</p> <p>Here is a stupid monkeypatch <strong>only to demonstrate my point</strong>. I don't know the inner workings of BeautifulSoup well enough to propose a proper solution.</p> <pre><code>from bs4 import BeautifulSoup from bs4.builder._htmlparser import BeautifulSoupHTMLParser from bsp.dammit import EntitySubstitution def handle_entityref(self, name): character = EntitySubstitution.HTML_ENTITY_TO_CHARACTER.get(name) if character is not None: data = character else: # Previously was # data = "&amp;%s;" % name data = "&amp;%s" % name self.handle_data(data) html = '&lt;td&gt;S&amp;P500&lt;/td&gt;' # Pre monkeypatching # &lt;td&gt;S&amp;amp;P500;&lt;/td&gt; print(BeautifulSoup(html)) BeautifulSoupHTMLParser.handle_entityref = handle_entityref # Post monkeypatching # &lt;td&gt;S&amp;amp;P500&lt;/td&gt; print(BeautifulSoup(html)) </code></pre> <p>Hopefully someone more versed in bs4 can give you a proper solution, good luck.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload