Note that there are some explanatory texts on larger screens.

plurals
  1. POHow does .NET determine the Unicode category of a character?
    primarykey
    data
    text
    <p>I was looking in mscorelib.dll with .NET Reflector, and stumbled upon the Char class. I always wondered how methods like Char.isLetter was done. I expected a huge list of test, but, buy digging a little bit, I found a really short code that determine the Unicode category. However, this code uses some kind of tables and some bitshifting voodoo. Can anyone explain to me how this is done, or point me to some resources?</p> <p><strong>EDIT :</strong> Here's the code. It's in System.Globalization.CharUnicodeInfo.</p> <pre><code>internal static unsafe byte InternalGetCategoryValue(int ch, int offset) { ushort num = s_pCategoryLevel1Index[ch &gt;&gt; 8]; num = s_pCategoryLevel1Index[num + ((ch &gt;&gt; 4) &amp; 15)]; byte* numPtr = (byte*) (s_pCategoryLevel1Index + num); byte num2 = numPtr[ch &amp; 15]; return s_pCategoriesValue[(num2 * 2) + offset]; } </code></pre> <p><code>s_pCategoryLevel1Index</code> is a <code>short*</code> and <code>s_pCategoryValues</code> is a <code>byte*</code></p> <p>Both are created in the CharUnicodeInfo static constructor : </p> <pre><code> static unsafe CharUnicodeInfo() { s_pDataTable = GlobalizationAssembly.GetGlobalizationResourceBytePtr(typeof(CharUnicodeInfo).Assembly, "charinfo.nlp"); UnicodeDataHeader* headerPtr = (UnicodeDataHeader*) s_pDataTable; s_pCategoryLevel1Index = (ushort*) (s_pDataTable + headerPtr-&gt;OffsetToCategoriesIndex); s_pCategoriesValue = s_pDataTable + ((byte*) headerPtr-&gt;OffsetToCategoriesValue); s_pNumericLevel1Index = (ushort*) (s_pDataTable + headerPtr-&gt;OffsetToNumbericIndex); s_pNumericValues = s_pDataTable + ((byte*) headerPtr-&gt;OffsetToNumbericValue); s_pDigitValues = (DigitValues*) (s_pDataTable + headerPtr-&gt;OffsetToDigitValue); nativeInitTable(s_pDataTable); } </code></pre> <p>Here is the UnicodeDataHeader.</p> <pre><code>internal struct UnicodeDataHeader { // Fields [FieldOffset(40)] internal uint OffsetToCategoriesIndex; [FieldOffset(0x2c)] internal uint OffsetToCategoriesValue; [FieldOffset(0x34)] internal uint OffsetToDigitValue; [FieldOffset(0x30)] internal uint OffsetToNumbericIndex; [FieldOffset(0x38)] internal uint OffsetToNumbericValue; [FieldOffset(0)] internal char TableName; [FieldOffset(0x20)] internal ushort version; } </code></pre> <p><strong>Note :</strong> I Hope this doesn't break any licence. If so, I'll remove the code.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload