Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I like Eugene's whitelist concept. I needed to do a similar thing as the original poster, but I needed to support all Unicode characters, not just up to 0x00FD. The XML spec is:</p> <p>Char = #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]</p> <p>In .NET, the internal representation of Unicode characters is only 16 bits, so we can't `allow' 0x10000-0x10FFFF explicitly. The XML spec explicitly <em>disallows</em> the surrogate code points starting at 0xD800 from appearing. However it is possible that if we allowed these surrogate code points in our whitelist, utf-8 encoding our string might produce valid XML in the end as long as proper utf-8 encoding was produced from the surrogate pairs of utf-16 characters in the .NET string. I haven't explored this though, so I went with the safer bet and didn't allow the surrogates in my whitelist.</p> <p>The comments in Eugene's solution are misleading though, the problem is that the characters we are excluding are not valid in <em>XML</em> ... they are perfectly valid Unicode code points. We are not removing `non-utf-8 characters'. We are removing utf-8 characters that may not appear in well-formed XML documents.</p> <pre><code>public static string XmlCharacterWhitelist( string in_string ) { if( in_string == null ) return null; StringBuilder sbOutput = new StringBuilder(); char ch; for( int i = 0; i &lt; in_string.Length; i++ ) { ch = in_string[i]; if( ( ch &gt;= 0x0020 &amp;&amp; ch &lt;= 0xD7FF ) || ( ch &gt;= 0xE000 &amp;&amp; ch &lt;= 0xFFFD ) || ch == 0x0009 || ch == 0x000A || ch == 0x000D ) { sbOutput.Append( ch ); } } return sbOutput.ToString(); } </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload