Note that there are some explanatory texts on larger screens.

plurals
  1. POHtml Tidy with php code, XHTML is not valid XML afterwards
    primarykey
    data
    text
    <p>I'm using <a href="http://tidy.sourceforge.net/" rel="nofollow">http://tidy.sourceforge.net/</a> to convert HTML to XHTML and I want to transform this XHTML later with XSLT.</p> <p>Unfortunately I tried a to parse a techcrunch site (just for testing). The techcrunch site contains php code and HTML tidy produces a NOT valid XML file with this php code.</p> <p>Simplified input file <code>dirty.htm</code>:</p> <pre class="lang-xml prettyprint-override"><code>&lt;html&gt; &lt;head&gt; &lt;/head&gt; &lt;body&gt; &lt;a href="http://www.crunchbase.com/company/google" onclick="&lt;?php tc_set_omniture_attr("post_widget_crunchbase") ?&gt;Google&lt;/a&gt; &lt;/body&gt; &lt;/html&gt; </code></pre> <p>and my output file with HTML Tidy <code>cleaned.htm</code>:</p> <pre class="lang-xml prettyprint-override"><code>&lt;!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"&gt; &lt;html xmlns="http://www.w3.org/1999/xhtml"&gt; &lt;head&gt; &lt;title&gt;&lt;/title&gt; &lt;/head&gt; &lt;body&gt; &lt;p&gt;&lt;a href="http://www.crunchbase.com/company/google" onclick="&lt;?php tc_set_omniture_attr("&gt;Google&lt;/a&gt;&lt;/p&gt; &lt;/body&gt; &lt;/html&gt; </code></pre> <p>The main problem is the <code>&lt;</code> in <code>onclick</code> which is not allowed as a XML attribute! XSLTProc refuses to open this not valid XML.</p> <p>My HTML Tidy Options <code>tidyconfig.cfg</code>:</p> <pre><code>output-xhtml: 1 indent: 0 tidy-mark: 0 wrap: 0 alt-text: doctype: strict force-output: 1 numeric-entities: 1 clean: 1 bare: 1 word-2000: 1 drop-proprietary-attributes: 1 enclose-text: 1 logical-emphasis: 1 </code></pre> <p>HTML Tidy commandline:</p> <pre><code>tidy -quiet -config tidyconfig.cfg -output cleaned.htm dirty.htm </code></pre> <p>Did I missed any HTML Tidy option? All Tidy options: <a href="http://tidy.sourceforge.net/docs/quickref.html" rel="nofollow">http://tidy.sourceforge.net/docs/quickref.html</a></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload