StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>In my opinion the testing program is deeply flawed, because it makes effectively useless transformations between Strings with no semantic meaning.</p> <p>If you want to check if all byte values are valid values for a given encoding, then something like this might be more like it:</p> <pre><code>public static void tryEncoding(final String encoding) throws UnsupportedEncodingException { int badCount = 0; for (int i = 1; i < 255; i++) { byte[] bytes = new byte[] { (byte) i }; String toString = new String(bytes, encoding); byte[] fromString = toString.getBytes(encoding); if (!Arrays.equals(bytes, fromString)) { System.out.println("Can't encode: " + i + " - in: " + Arrays.toString(bytes) + "/ out: " + Arrays.toString(fromString) + " - result: " + toString); badCount++; } } System.out.println("Bad count: " + badCount); } </code></pre> <p>Note that this testing program tests inputs using the (usnigned) <strong>byte values</strong> from 1 to 255. The code in the question uses the <em>char values</em> (equivalent to Unicode codepoints in this range) from 1 to 255.</p> <p>Try printing the actual byte arrays handled by the program in the example and you see that you're not actually checking all byte values and that some of your "bad" matches are duplicates of others.</p> <p>Running this with <code>"Windows-1252"</code> as the argument produces this output:</p> <pre> Can't encode: 129 - in: [-127]/ out: [63] - result: � Can't encode: 141 - in: [-115]/ out: [63] - result: � Can't encode: 143 - in: [-113]/ out: [63] - result: � Can't encode: 144 - in: [-112]/ out: [63] - result: � Can't encode: 157 - in: [-99]/ out: [63] - result: � Bad count: 5 </pre> <p>Which tells us that <code>Windows-1252</code> doesn't accept the byte values 129, 1441, 143, 144 and 157 as valid values. (Note: I'm talking about unsigned byte values here. The code above shows -127, -115, ... because Java only knows unsigned bytes).</p> <p><a href="http://en.wikipedia.org/wiki/Windows-1252" rel="nofollow noreferrer">The Wikipedia article on Windows-1252</a> seems to verify this observation by stating this:</p> <blockquote> <p>According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused</p> </blockquote>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload