Note that there are some explanatory texts on larger screens.

plurals
  1. POSanitize html encoded text (#decimal notation) from AntiXSS v3 output
    primarykey
    data
    text
    <p>I am tying to make comments in a blog engine XSS-safe. Tried a lot of different approaches but find it very difficult.</p> <p>When I am displaying the comments I am first using <a href="http://www.codeplex.com/AntiXSS" rel="nofollow noreferrer">Microsoft AntiXss 3.0</a> to html encode the whole thing. Then I am trying to html decode the safe tags using a whitelist approach.</p> <p>Been looking at <a href="http://refactormycode.com/codes/333-sanitize-html#refactor_44440" rel="nofollow noreferrer">Steve Downing's example</a> in Atwood's "Sanitize HTML" thread at refactormycode.</p> <p>My problem is that the AntiXss library encodes the values to &amp;#DECIMAL; notation and I don't know how to rewrite Steve's example, since my regex knowledge is limited.</p> <p>I tried the following code where I simply replaced entities to decimal form but it does not work properly. </p> <pre><code>&amp;lt; with &amp;#60; &amp;gt; with &amp;#62; </code></pre> <p>My rewrite:</p> <pre><code>class HtmlSanitizer { /// &lt;summary&gt; /// A regex that matches things that look like a HTML tag after HtmlEncoding. Splits the input so we can get discrete /// chunks that start with &amp;lt; and ends with either end of line or &amp;gt; /// &lt;/summary&gt; private static Regex _tags = new Regex("&amp;#60;(?!&amp;#62;).+?(&amp;#62;|$)", RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled); /// &lt;summary&gt; /// A regex that will match tags on the whitelist, so we can run them through /// HttpUtility.HtmlDecode /// FIXME - Could be improved, since this might decode &amp;gt; etc in the middle of /// an a/link tag (i.e. in the text in between the opening and closing tag) /// &lt;/summary&gt; private static Regex _whitelist = new Regex(@" ^&amp;#60;/?(a|b(lockquote)?|code|em|h(1|2|3)|i|li|ol|p(re)?|s(ub|up|trong|trike)?|ul)&amp;#62;$ |^&amp;#60;(b|h)r\s?/?&amp;#62;$ |^&amp;#60;a(?!&amp;#62;).+?&amp;#62;$ |^&amp;#60;img(?!&amp;#62;).+?/?&amp;#62;$", RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture | RegexOptions.Compiled); /// &lt;summary&gt; /// HtmlDecode any potentially safe HTML tags from the provided HtmlEncoded HTML input using /// a whitelist based approach, leaving the dangerous tags Encoded HTML tags /// &lt;/summary&gt; public static string Sanitize(string html) { string tagname = ""; Match tag; MatchCollection tags = _tags.Matches(html); string safeHtml = ""; // iterate through all HTML tags in the input for (int i = tags.Count - 1; i &gt; -1; i--) { tag = tags[i]; tagname = tag.Value.ToLowerInvariant(); if (_whitelist.IsMatch(tagname)) { // If we find a tag on the whitelist, run it through // HtmlDecode, and re-insert it into the text safeHtml = HttpUtility.HtmlDecode(tag.Value); html = html.Remove(tag.Index, tag.Length); html = html.Insert(tag.Index, safeHtml); } } return html; } } </code></pre> <p>My input testing html is:</p> <pre><code>&lt;p&gt;&lt;script language="javascript"&gt;alert('XSS')&lt;/script&gt;&lt;b&gt;bold should work&lt;/b&gt;&lt;/p&gt; </code></pre> <p>After AntiXss it turns into:</p> <pre><code>&amp;#60;p&amp;#62;&amp;#60;script language&amp;#61;&amp;#34;javascript&amp;#34;&amp;#62;alert&amp;#40;&amp;#39;XSS&amp;#39;&amp;#41;&amp;#60;&amp;#47;script&amp;#62;&amp;#60;b&amp;#62;bold should work&amp;#60;&amp;#47;b&amp;#62;&amp;#60;&amp;#47;p&amp;#62; </code></pre> <p>When I run the version of Sanitize(string html) above it gives me:</p> <pre><code>&lt;p&gt;&lt;script language="javascript"&gt;alert&amp;#40;&amp;#39;XSS&amp;#39;&amp;#41;&lt;/script&gt;&lt;b&gt;bold should work&lt;/b&gt;&lt;/p&gt; </code></pre> <p>The regex is matching script from the whitelist which I don't want. Any help with this would be highly appreciated.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload