Note that there are some explanatory texts on larger screens.

plurals
  1. POMaking BeautifulSoup ignore contents inside script tags
    primarykey
    data
    text
    <p>I have been trying to get BeautifulSoup (3.1.0.1)to parse a html page that has a lot of javascript that generates html inside tags. One example fragment looks like this :</p> <pre><code>&lt;html&gt;&lt;head&gt;&lt;body&gt;&lt;div&gt; &lt;script type='text/javascript'&gt; if(ii &gt; 0) { html += '&lt;span id="hoverMenuPosSepId" class="hoverMenuPosSep"&gt;|&lt;/span&gt;' } html += '&lt;div class="hoverMenuPos" id="hoverMenuPosId" onMouseOver=\"menuOver_3821();\" ' + 'onMouseOut=\"menuOut_3821();\"&gt;'; if (children[ii].uri == location.pathname) { html += '&lt;a class="hiHover" href="' + children[ii].uri + '" ' + onClick + '&gt;'; } else { html += '&lt;a class="hover" href="' + children[ii].uri + '" ' + onClick + '&gt;'; } html += children[ii].name + '&lt;/a&gt;&lt;/div&gt;'; } } hp = document.getElementById("hoverpopup_3821"); hp.style.top = (parseInt(hoveritem.offsetTop) + parseInt(hoveritem.offsetHeight)) + "px"; hp.style.visibility = "Visible"; hp.innerHTML = html; } return false; } function menuOut_3821() { timeOn_3821 = setTimeout("showSelected_3821()", 1000) } var timeOn_3821 = null; function menuOver_3821() { clearTimeout(timeOn_3821) } function showSelected_3821() { showChildrenMenu_3821( document.getElementById("flatMenuItemAnchor" + selectedPageId), selectedPageId); } &lt;/script&gt; &lt;/body&gt; &lt;/html&gt; </code></pre> <p>BeautifulSoup doesn't seem to be able to deal with this and is complaning about "malformed start tag" around the onMouseOver=**\"**menuOver_3821();\". It seems to try parsing the xml that is generated by javascript inside the script block ?!?</p> <p>Any ideas how to make BeautifulSoup ignores the script tags content ?</p> <p>I have seen other suggestion of using lxml but can't since it has to run on Google AppEngine.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload