Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Here is an example of stripping out all attributes and allowing only tags in <code>[table, tr, td]</code>. I've added a few Unicode entities for sake of illustration. </p> <pre><code>DATA = '''&lt;table border="1"&gt;&lt;tr colspan="4"&gt;&lt;td rowspan="2"&gt;\r 224&amp;#13; &amp;#8220;hi there&amp;#8221; 9:00 am\r -3:00 pm&amp;#13; NPHC Leadership&lt;/td&gt;\r &lt;td rowspan="2"&gt;\r &lt;font&gt;ALSO IN 223; WALL OPEN&lt;/font&gt;&lt;/td&gt;\r &lt;/table&gt;''' import lxml.html from lxml.html import clean def _clean_attrib(node): for n in node: _clean_attrib(n) node.attrib.clear() tree = lxml.html.fromstring(DATA) cleaner = clean.Cleaner(allow_tags=['table','tr','td'], remove_unknown_tags=False) cleaner.clean_html(tree) _clean_attrib(tree) print lxml.html.tostring(tree, encoding='utf-8', pretty_print=True, method='html') </code></pre> <p>Result:</p> <pre><code>&lt;table&gt;&lt;tr&gt; &lt;td&gt; 224 “hi there” 9:00 am -3:00 pm NPHC Leadership&lt;/td&gt; &lt;td&gt; &lt;font&gt;ALSO IN 223; WALL OPEN&lt;/font&gt; &lt;/td&gt; &lt;/tr&gt;&lt;/table&gt; </code></pre> <p>Are you sure you want to strip out all entities? The <code>&amp;#13;</code> corresponds to a carriage return, and when lxml parses the document it converts all entities to their corresponding Unicode characters.</p> <p>Whether entities show up is also dependent on the output method and encoding. For example, if you use <code>lxml.html.tostring(encoding='ascii', method='xml')</code> the <code>'\r'</code> and Unicode characters will be output as entities:</p> <pre><code>&lt;table&gt; &lt;tr&gt;&lt;td&gt;&amp;#13; &amp;#8220;hi there&amp;#8221; ... </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload