Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p><strong>EDIT</strong> After the merge of this question with <a href="https://stackoverflow.com/questions/20674300/how-to-compare-and-in-c-sharp">How to compare &#39;μ&#39; and &#39;&#181;&#39; in C#</a><br> Original answer posted:</p> <pre><code> "μ".ToUpper().Equals("µ".ToUpper()); //This always return true. </code></pre> <p><strong>EDIT</strong> After reading the comments, yes it is not good to use the above method because it may provide wrong results for some other type of inputs, for this we should use <a href="http://msdn.microsoft.com/en-us/library/ebza6ck1.aspx" rel="nofollow noreferrer">normalize</a> using full compatibility decomposition as mentioned in <a href="http://en.wikipedia.org/wiki/Unicode_equivalence#Normalization" rel="nofollow noreferrer">wiki</a>. (Thanks to the answer posted by <a href="https://stackoverflow.com/questions/20674577/how-to-compare-unicode-characters-that-look-alike/20674872#20674872">BoltClock</a>)</p> <pre><code> static string GREEK_SMALL_LETTER_MU = new String(new char[] { '\u03BC' }); static string MICRO_SIGN = new String(new char[] { '\u00B5' }); public static void Main() { string Mus = "µμ"; string NormalizedString = null; int i = 0; do { string OriginalUnicodeString = Mus[i].ToString(); if (OriginalUnicodeString.Equals(GREEK_SMALL_LETTER_MU)) Console.WriteLine(" INFORMATIO ABOUT GREEK_SMALL_LETTER_MU"); else if (OriginalUnicodeString.Equals(MICRO_SIGN)) Console.WriteLine(" INFORMATIO ABOUT MICRO_SIGN"); Console.WriteLine(); ShowHexaDecimal(OriginalUnicodeString); Console.WriteLine("Unicode character category " + CharUnicodeInfo.GetUnicodeCategory(Mus[i])); NormalizedString = OriginalUnicodeString.Normalize(NormalizationForm.FormC); Console.Write("Form C Normalized: "); ShowHexaDecimal(NormalizedString); NormalizedString = OriginalUnicodeString.Normalize(NormalizationForm.FormD); Console.Write("Form D Normalized: "); ShowHexaDecimal(NormalizedString); NormalizedString = OriginalUnicodeString.Normalize(NormalizationForm.FormKC); Console.Write("Form KC Normalized: "); ShowHexaDecimal(NormalizedString); NormalizedString = OriginalUnicodeString.Normalize(NormalizationForm.FormKD); Console.Write("Form KD Normalized: "); ShowHexaDecimal(NormalizedString); Console.WriteLine("_______________________________________________________________"); i++; } while (i &lt; 2); Console.ReadLine(); } private static void ShowHexaDecimal(string UnicodeString) { Console.Write("Hexa-Decimal Characters of " + UnicodeString + " are "); foreach (short x in UnicodeString.ToCharArray()) { Console.Write("{0:X4} ", x); } Console.WriteLine(); } </code></pre> <p><strong>Output</strong></p> <pre><code>INFORMATIO ABOUT MICRO_SIGN Hexa-Decimal Characters of µ are 00B5 Unicode character category LowercaseLetter Form C Normalized: Hexa-Decimal Characters of µ are 00B5 Form D Normalized: Hexa-Decimal Characters of µ are 00B5 Form KC Normalized: Hexa-Decimal Characters of µ are 03BC Form KD Normalized: Hexa-Decimal Characters of µ are 03BC ________________________________________________________________ INFORMATIO ABOUT GREEK_SMALL_LETTER_MU Hexa-Decimal Characters of µ are 03BC Unicode character category LowercaseLetter Form C Normalized: Hexa-Decimal Characters of µ are 03BC Form D Normalized: Hexa-Decimal Characters of µ are 03BC Form KC Normalized: Hexa-Decimal Characters of µ are 03BC Form KD Normalized: Hexa-Decimal Characters of µ are 03BC ________________________________________________________________ </code></pre> <p>While reading information in <a href="http://en.wikipedia.org/wiki/Unicode_equivalence" rel="nofollow noreferrer">Unicode_equivalence</a> I found</p> <blockquote> <p>The choice of equivalence criteria can affect search results. For instance some typographic ligatures like U+FB03 (ffi), ..... so a <strong>search</strong> for U+0066 (f) as substring would <strong>succeed</strong> in an <strong>NFKC</strong> normalization of U+FB03 but not in <strong>NFC</strong> normalization of U+FB03. </p> </blockquote> <p>So to compare equivalence we should normally use <strong><code>FormKC</code></strong> i.e. NFKC normalization or <strong><code>FormKD</code></strong> i.e NFKD normalization.<br> I was little curious to know more about all the Unicode characters so I made sample which would iterate over all the Unicode character in <code>UTF-16</code> and I got some results I want to discuss</p> <ul> <li>Information about characters whose <code>FormC</code> and <code>FormD</code> normalized values were not equivalent<br> <code>Total: 12,118</code><br> <code>Character (int value): 192-197, 199-207, 209-214, 217-221, 224-253, ..... 44032-55203</code></li> <li>Information about characters whose <code>FormKC</code> and <code>FormKD</code> normalized values were not equivalent<br> <code>Total: 12,245</code><br> <code>Character (int value): 192-197, 199-207, 209-214, 217-221, 224-228, ..... 44032-55203, 64420-64421, 64432-64433, 64490-64507, 64512-64516, 64612-64617, 64663-64667, 64735-64736, 65153-65164, 65269-65274</code></li> <li>All the character whose <code>FormC</code> and <code>FormD</code> normalized value were not equivalent, there <code>FormKC</code> and <code>FormKD</code> normalized values were also not equivalent except these characters<br> Characters: <code>901 '΅', 8129 '῁', 8141 '῍', 8142 '῎', 8143 '῏', 8157 '῝', 8158 '῞'</code><br> <code>, 8159 '῟', 8173 '῭', 8174 '΅'</code></li> <li>Extra character whose <code>FormKC</code> and <code>FormKD</code> normalized value were not equivalent, but there <code>FormC</code> and <code>FormD</code> normalized values were equivalent<br> <code>Total: 119</code><br> Characters: <code>452 'DŽ' 453 'Dž' 454 'dž' 12814 '㈎' 12815 '㈏' 12816 '㈐' 12817 '㈑' 12818 '㈒' 12819 '㈓' 12820 '㈔' 12821 '㈕', 12822 '㈖' 12823 '㈗' 12824 '㈘' 12825 '㈙' 12826 '㈚' 12827 '㈛' 12828 '㈜' 12829 '㈝' 12830 '㈞' 12910 '㉮' 12911 '㉯' 12912 '㉰' 12913 '㉱' 12914 '㉲' 12915 '㉳' 12916 '㉴' 12917 '㉵' 12918 '㉶' 12919 '㉷' 12920 '㉸' 12921 '㉹' 12922 '㉺' 12923 '㉻' 12924 '㉼' 12925 '㉽' 12926 '㉾' 13056 '㌀' 13058 '㌂' 13060 '㌄' 13063 '㌇' 13070 '㌎' 13071 '㌏' 13072 '㌐' 13073 '㌑' 13075 '㌓' 13077 '㌕' 13080 '㌘' 13081 '㌙' 13082 '㌚' 13086 '㌞' 13089 '㌡' 13092 '㌤' 13093 '㌥' 13094 '㌦' 13099 '㌫' 13100 '㌬' 13101 '㌭' 13102 '㌮' 13103 '㌯' 13104 '㌰' 13105 '㌱' 13106 '㌲' 13108 '㌴' 13111 '㌷' 13112 '㌸' 13114 '㌺' 13115 '㌻' 13116 '㌼' 13117 '㌽' 13118 '㌾' 13120 '㍀' 13130 '㍊' 13131 '㍋' 13132 '㍌' 13134 '㍎' 13139 '㍓' 13140 '㍔' 13142 '㍖' .......... ﺋ' 65164 'ﺌ' 65269 'ﻵ' 65270 'ﻶ' 65271 'ﻷ' 65272 'ﻸ' 65273 'ﻹ' 65274'</code></li> <li>There are some characters which <strong>can not be normalized</strong>, they throw <strong><code>ArgumentException</code></strong> if tried<br> <code>Total:2081</code> <code>Characters(int value): 55296-57343, 64976-65007, 65534</code></li> </ul> <p>This links can be really helpful to understand what rules govern for Unicode equivalence</p> <ol> <li><a href="http://en.wikipedia.org/wiki/Unicode_equivalence" rel="nofollow noreferrer">Unicode_equivalence</a> </li> <li><a href="http://en.wikipedia.org/wiki/Unicode_compatibility_characters" rel="nofollow noreferrer">Unicode_compatibility_characters</a></li> </ol>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload