Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <h2>Short Answer</h2> <p>Do this instead (assuming your documents are all well-formed XML)</p> <pre><code>etx = lxml.etree.parse('test.html') print lxml.etree.tostring(etx, xml_declaration=True, encoding=etx.docinfo.encoding, standalone=etx.docinfo.standalone) </code></pre> <h2>Explanation</h2> <p><code>test.html</code> is not actually valid html. It has empty elements and an xml processing instruction. These are not understood by html. The html parser is interpreting the xml processing instruction as an SGML processing instruction (these are like <code>&lt;? ... &gt;</code> instead of xml <code>&lt;? ... ?&gt;</code>) with content <code>xml version="1.0" encoding="UTF-8" standalone="no"?</code>. Thus when reserializing as XML, the XML processing instruction has double questions, like so: <code>??&gt;</code></p> <p>Your results with <code>html5lib</code> parser or serializer are a little better--when reserialized to XML, the processing instruction will be in comments. This is because HTML5 doesn't allow SGML processing instructions either, and will interpret the xml preamble as garbage text to ignore.</p> <p>To get the results you want, parse and serialize your document with the xml parser (<code>lxml.etree</code>) instead. It appears to be well-formed xml and valid XHTML1.1. If you serialize with the html serializer instead (<code>lxml.html.tostring()</code>, not <code>lxml.html.etree.tostring()</code>), it will output a polyglot xhtml document.</p> <p>A wrinkle is that the serializer does not try to preserve the xml declaration exactly (this is after all not part of the xml infoset). You will have to pass these to the <code>tostring()</code> method from the <code>docinfo</code> property.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload