Note that there are some explanatory texts on larger screens.

plurals
  1. POWhat is the deal with the unicode character 首(U+9996) and how java/mysql deal with it and its friends?
    primarykey
    data
    text
    <p>I have a java String that contains the unicode character U+9996 (that's what I get if I do codePointAt()). </p> <p>If I look at it in the debugger expressions panel (in eclipse) then all is well and it looks like "<strong>首</strong>". However if I print it out to the console I get simply "<strong>?</strong>". It doesn't seem to be the font that's the problem as I've tried setting that differently.</p> <p>My real problem is that I'm trying to put the string into a MySQL database (with utf8 encoding). Lots of other wide characters show up fine in the db but, again, this one and some others like it show up as "?". All of which leads me to believe that the problem is on the java side.</p> <p>In chasing down this bug I've learnt a little about <a href="http://en.wikipedia.org/wiki/Unicode_normalization" rel="nofollow noreferrer">Unicode Normalization</a> and java.text.<a href="http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html" rel="nofollow noreferrer">Normalizer</a> which looks like it might be relevant in this case. I've learnt that U+9996 is the canonical version of U+2FB8. U+2FB8 has exactly the same problems above though as regards display and anyway why would I want to transform to a non-canonical representation (even if I could, which I don't think I can)?</p> <p>Anyway, there's one potential clue I've found which I've been unable to comprehend. <a href="http://www.fileformat.info/info/unicode/char/9996/index.htm" rel="nofollow noreferrer">This page</a> contains the words "U+9996 is not a valid unicode character" with no further explanation. It then proceeds to show how to encode this supposedly non-valid unicode character in various unicode encodings. So my question is this basically: WTF?</p> <hr> <h2>UPDATES</h2> <ul> <li>I'm on a Mac.</li> <li>I'm talking about the Eclipse console. <ul> <li>I set the console encoding to UTF-8 under Run > Common</li> <li>I added <code>-Dfile.encoding=UTF-8</code> to the JVM arguments (the default was MacRoman)</li> <li>The console (Eclipse and Terminal.app) now show the right chars. Hooray!</li> </ul></li> <li>I'm mostly interested in the data getting into the database correctly though of course I'd like to get a total understanding of what's going on here. </li> <li>I think I've fixed the database problem. I forgot to set the encoding on the <em>connection</em>. Now I don't understand why some asian characters were getting through and not others.</li> <li>Phew, stackoverflow moves fast. It's hard to keep up. Thanks people.</li> </ul>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload