Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This can be done with regex; however, it's not as simple as you suggest. You will need to find valid tags and process them in order to make this work. It just so happens that I did this some time ago when writing a fast and lightwieght xml/html parser. The code is available at:</p> <p><a href="http://csharptest.net/browse/src/Library/Html/XmlLightParser.cs" rel="noreferrer">http://csharptest.net/browse/src/Library/Html/XmlLightParser.cs</a> <a href="http://csharptest.net/browse/src/Library/Html/XmlLightInterfaces.cs" rel="noreferrer">http://csharptest.net/browse/src/Library/Html/XmlLightInterfaces.cs</a></p> <p>To use the parser, you will implement the defined interface <code>IXmlLightReader</code> from the later of the two source files. The following example produces your desired results, and also handles several other capabilities you did not mention, like CDATA sections, processing instructions, DTDs, etc.</p> <pre><code>class RegexForBadXml { const string Input = "&lt;?xml version=\"1.0\"?&gt;\r\n&lt;div&gt;\r\n\t&lt;a href=\"link\"&gt;Link with &lt; characters&lt;/a&gt;\r\n\t&lt;knownTag&gt;Text with character &gt; &amp;and other &amp;#BAD; stuff&lt;/knownTag&gt;\r\n\t&lt;knownTag&gt;Text &lt; again &gt;&lt;/knownTag&gt;\r\n\t&lt;knownTag&gt;&lt;![CDATA[ Text &lt; again &gt; ]]&gt;&lt;/knownTag&gt;\r\n&lt;div&gt;"; private static void Main() { var output = new StringWriter(); XmlLightParser.Parse(Input, XmlLightParser.AttributeFormat.Html, new OutputFormatter(output)); Console.WriteLine(output.ToString()); } private class OutputFormatter : IXmlLightReader { private readonly TextWriter _output; public OutputFormatter(TextWriter output) { _output = output; } void IXmlLightReader.StartDocument() { } void IXmlLightReader.EndDocument() { } public void StartTag(XmlTagInfo tag) { _output.Write(tag.UnparsedTag); } public void EndTag(XmlTagInfo tag) { _output.Write(tag.UnparsedTag); } public void AddText(string content) { _output.Write(HttpUtility.HtmlEncode(HttpUtility.HtmlDecode(content))); } public void AddComment(string comment) { _output.Write(comment); } public void AddCData(string cdata) { _output.Write(cdata); } public void AddControl(string cdata) { _output.Write(cdata); } public void AddInstruction(string instruction) { _output.Write(instruction); } } } </code></pre> <p>The preceeding program outputs the following results:</p> <pre><code>&lt;?xml version="1.0"?&gt; &lt;div&gt; &lt;a href="link"&gt;Link with &amp;lt; characters&lt;/a&gt; &lt;knownTag&gt;Text with character &amp;gt; &amp;amp;and other &amp;amp;BAD; stuff&lt;/knownTag&gt; &lt;knownTag&gt;Text &amp;lt; again &amp;gt;&lt;/knownTag&gt; &lt;knownTag&gt;&lt;![CDATA[ Text &lt; again &gt; ]]&gt;&lt;/knownTag&gt; &lt;div&gt; </code></pre> <p>Note: I added the xml declaration, CDATA, and '&amp;' text for testing only.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload