Note that there are some explanatory texts on larger screens.

plurals
  1. POIs ED A0 80 ED B0 80 a valid UTF-8 byte sequence?
    text
    copied!<p><a href="http://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html#decode%28java.nio.ByteBuffer%29" rel="nofollow noreferrer">java.nio.charset.Charset.forName("utf8").decode</a> decodes a byte sequence of </p> <pre><code> ED A0 80 ED B0 80 </code></pre> <p>into the Unicode codepoint:</p> <pre><code> U+10000 </code></pre> <p><a href="http://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html#decode%28java.nio.ByteBuffer%29" rel="nofollow noreferrer">java.nio.charset.Charset.forName("utf8").decode</a> also decodes a byte sequence of </p> <pre><code> F0 90 80 80 </code></pre> <p>into the Unicode codepoint:</p> <pre><code> U+10000 </code></pre> <p>This is verified by the <a href="https://stackoverflow.com/q/8843742/632951/#snippet1">code below</a>.</p> <p>Now this seems to be telling me that the UTF-8 encoding scheme will decode <code>ED A0 80 ED B0 80</code> and <code>F0 90 80 80</code> into the same unicode codepoint.</p> <p>However, if I visit <a href="https://www.google.com/search?query=%ED%A0%80%ED%B0%80" rel="nofollow noreferrer">https://www.google.com/search?query=<strong>%ED%A0%80%ED%B0%80</strong></a>,</p> <p>I can see that it is clearly different from the page <a href="https://www.google.com/search?query=%F0%90%80%80" rel="nofollow noreferrer">https://www.google.com/search?query=<strong>%F0%90%80%80</strong></a></p> <p>Since the Google Search is using UTF-8 encoding scheme (correct me if I'm wrong) as well,</p> <p>This suggests that the UTF-8 does not decode <code>ED A0 80 ED B0 80</code> and <code>F0 90 80 80</code> into the same unicode codepoint(s).</p> <p>So basically I was wondering, by the <em>official</em> standard, should UTF-8 decode <code>ED A0 80 ED B0 80</code> byte sequence into the Unicode codepoint U+10000 ?</p> <p> <b>Code</b>:</p> <pre><code>public class Test { public static void main(String args[]) { java.nio.ByteBuffer bb = java.nio.ByteBuffer.wrap(new byte[] { (byte) 0xED, (byte) 0xA0, (byte) 0x80, (byte) 0xED, (byte) 0xB0, (byte) 0x80 }); java.nio.CharBuffer cb = java.nio.charset.Charset.forName("utf8").decode(bb); for (int x = 0, xx = cb.limit(); x &lt; xx; ++x) { System.out.println(Integer.toHexString(cb.get(x))); } System.out.println(); bb = java.nio.ByteBuffer.wrap(new byte[] { (byte) 0xF0, (byte) 0x90, (byte) 0x80, (byte) 0x80 }); cb = java.nio.charset.Charset.forName("utf8").decode(bb); for (int x = 0, xx = cb.limit(); x &lt; xx; ++x) { System.out.println(Integer.toHexString(cb.get(x))); } } } </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload