Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>You are confusing code points, graphemes and encoding.</p> <p>The encoding is how code points are converted into an octet stream for storage, transmission or processing. Both UTF-8 and UTF-16 are variable width encodings, with different code points needing a different number of octets (for UTF-8 anything from 1 to, IIRC, 6 and UTF-16 either 2 or 4).</p> <p>Graphemes are "what we see as a character", these are what are displayed. One code point (e.g. LATIN LOWER CASE A) for one grapheme, but in other cases multiple code points might be needed (e.g. LATIN LOWER CASE A, COMBINING ACUTE and COMBINING UNDERSCORE to get an lower case with acute and underscore as used in <a href="http://en.wikipedia.org/wiki/Kwakwala" rel="nofollow noreferrer">Kwakwala</a>). In some cases there is more than one combination of code points to create the same grapheme (e.g. LATIN LOWER CASE A WITH ACUTE and COMBINING UNDERSCORE), this is "normalisation",</p> <p>I.e. the length of the encoding of a single grapheme will depend on the encoding and normalisation.</p> <p>The display width of the grapheme will depend on the typeface, style and size independently of the encoding length.</p> <p>For more information, see Wikipedia on <a href="http://en.wikipedia.org/wiki/Unicode" rel="nofollow noreferrer">Unicode</a> and <a href="http://unicode.org/" rel="nofollow noreferrer">Unicode's home</a>. There are also some excellent books, perhaps most notably "<a href="http://oreilly.com/catalog/9780596102425/" rel="nofollow noreferrer">Fonts &amp; Encodings</a>" by Yannis Haralambous, O'Reilly.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload