Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <blockquote> <p>I am asking for the count of all the possible valid combinations in Unicode with explanation.</p> </blockquote> <p><strong>1,111,998</strong>: 17 planes &times; 65,536 characters per plane - 2048 surrogates - 66 noncharacters</p> <p>Note that UTF-8 and UTF-32 could theoretically encode much more than 17 planes, but the range is restricted based on the <a href="https://stackoverflow.com/questions/130438/do-utf-8-utf-16-and-utf-32-unicode-encodings-differ-in-the-number-of-characters">limitations of the UTF-16 encoding</a>.</p> <p><strong>109,384</strong> code points are actually assigned in <a href="http://www.unicode.org/versions/Unicode6.0.0/ch01.pdf" rel="noreferrer">Unicode 6.0</a>.</p> <blockquote> <p>I also don't understand why continuation bytes have restrictions even though starting byte of that char clears how long it should be.</p> </blockquote> <p>The purpose of this restriction in UTF-8 is to make the encoding <a href="http://en.wikipedia.org/wiki/Self-synchronizing_code" rel="noreferrer">self-synchronizing</a>.</p> <p>For a counterexample, consider the Chinese GB18030 encoding. There, the letter <code>ß</code> is represented as the byte sequence <code>81 30 89 38</code>, which contains the encoding of the digits <code>0</code> and <code>8</code>. So if you have a string-searching function not designed for this encoding-specific quirk, then a search for the digit <code>8</code> will find a false positive within the letter <code>ß</code>.</p> <p>In UTF-8, this cannot happen, because the non-overlap between lead bytes and trail bytes guarantees that the encoding of a shorter character can never occur within the encoding of a longer character.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload