Note that there are some explanatory texts on larger screens.

plurals
  1. POBest way to "fix" malformed html for use in an xsl transform
    primarykey
    data
    text
    <p>I have an input xml document that contains mal-formed html which has been xml encoded. i.e. the xml document itself is technically valid.</p> <p>Now I am applying an xsl transform to the xml which output well-formed xhtml5 but contains the mal-formed html.</p> <p>Examples of the bad html:</p> <ul> <li>html, head and body tags in html fragments.</li> <li>font tags</li> <li>mismatched quotes</li> <li>unclosed tags</li> <li>extra close tags with no matching open</li> <li>close tags in the wrong order (e.g. <code>&lt;b&gt;&lt;u&gt;text&lt;/b&gt;&lt;/u&gt;</code>)</li> </ul> <p>Now in my situation I actually don't care that the html is mal-formed - I only care that <em>my</em> closing tags match my opening tags, regardless of what goes in between.</p> <p>So my question is - what is the best way to either</p> <ol> <li>Clean up the html sufficiently that it does not affect other tags (preferably from within the transform itself)</li> <li>or somehow mark a closetag so that html5 compatible browsers recognise it as matching a particular open tag regardless of whatever nasty markup may be in between.</li> </ol> <p>for 2. I have no ideas at all. I have a couple of ideas for 1. such as calling an external tool like tidy or using a .NET sgml parser</p> <p>.NET xsl scripts (<code>msxsl:script</code>) are acceptable, if undesirable.</p> <p>Example source:</p> <pre><code>&lt;xml&gt; &amp;lt;b&amp;gt;&amp;lt;u&amp;gt;bad html&amp;lt;/b&amp;gt;&amp;lt;/u&amp;gt; &lt;xml&gt; </code></pre> <p>Example output:</p> <pre><code>&lt;div id="MyDiv"&gt; &lt;b&gt;&lt;u&gt;bad html&lt;/b&gt;&lt;/u&gt; &lt;/div&gt; &lt;!-- this /div absolutly must match the opening div regardless of what might be in the bad html --&gt; </code></pre> <p>What other approaches are available?</p> <p>C#, VS2012, xslt 1.0 only</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload