Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>The author is wrong. You're right, a given character has the same Unicode code point for any correct implementation of Unicode. I seriously doubt that there were multiple representations even at the infancy of Unicode; that would have defeated the whole purpose.</p> <p>She may be describing non-Unicode character sets such as the various ISO-8859 standards and the Windows code pages such as 1252. Unicode code points in the range 0x80 to 0x9F (decimal 128 to 159) are control characters; some 8-bit character sets have used those codes for accented letters and other symbols.</p> <p>The character <code>'é'</code> has the Unicode code point 233 (0xe9). That is invariant. (Are you sure the book said it's 234 in "the Unix Unicode table?)</p> <p>There are alternate ways of representing certain characters; for example, <code>'é'</code> can also be represented as a combination of <code>e</code> (0x65) with a combining acute accent (0x301), but that's not what the author is talking about.</p> <p>Copying information from comments, the book is in French, and is titled "Le Livre de Java premiere langage", by Anne Tasso; the cited version is the 3rd edition, published in 2005. It's available in PDF format <a href="http://www.eyrolles.com/Chapitres/9782212116793/chap2_Tasso.pdf" rel="nofollow">here</a>. (The web site name matches the name of the publisher and copyright holder on the first page, so it appears to be a legitimate copy.)</p> <p>In the original French:</p> <blockquote> <p>Le caractère é est défini en position <code>234</code> dans la table Unicode d’Unix, alors qu’il est en position <code>200</code> dans la table Unicode du système Mac OS. Les caractères spéciaux et, par conséquent, les caractères accentués ne sont pas traités de la même façon d’un environnement à l’autre : un même code Unicode ne correspond pas au même caractère</p> </blockquote> <p>which, as far as I can tell from my somewhat limited ability to read French, is simply nonsense.</p> <p>In the quoted table, the representations shown for Unix and Windows are identical, and are consistent with actual Unicode (which makes me think the "234" in the text above that is a typo in the book).</p> <p>There is an 8-bit extended ASCII representation called <a href="http://en.wikipedia.org/wiki/Mac_OS_Roman" rel="nofollow">Mac OS Roman</a>, but it's inconsistent with what's shown in the table (for example <code>'é'</code> is 0x8E, not 0xC8), and it's clearly not Unicode.</p> <p><a href="http://en.wikipedia.org/wiki/Windows_1252" rel="nofollow">Windows-1252</a> is a common 8-bit encoding for Windows, and perhaps also for MS-DOS, but it's also inconsistent with anything shown in that table; <code>'é'</code> is 0xE9, just as it is in Unicode.</p> <p>I have no idea where the DOS and MacOS entries came from, or where the author got the idea that Unicode code points vary across operating systems.</p> <p>I wonder if it's possible that some old implementations of Java have implemented Unicode incorrectly (though character display would be handled by the OS, not by Java). Even if that were the case, I'd expect that any modern Java implementation would get this right. Java might have problems with characters outside the <a href="http://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane" rel="nofollow">Basic Multilingual Plane</a>, but that's not relevant for characters like <code>'é'</code>.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload