StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p><strong>New analysis, based on new information.</strong><br> It looks like your problem is with the encoding of the text <em>before</em> it was stored in the Access DB. It seems it had been encoded as ISO-8859-1 or windows-1252, but decoded as cp850, resulting in the string <code>HANDICAP╔ES</code> being stored in the DB. </p> <p>Having correctly retrieved that string from the DB, you're now trying to reverse the original encoding error and recover the string as it should have been stored: <code>HANDICAPÉES</code>. And you're accomplishing that with this line:</p> <pre><code>String valueISO = new String(valueCP850.getBytes("CP850"), "ISO-8859-1"); </code></pre> <p><code>getBytes("CP850")</code> converts the character <code>╔</code> to the byte value <code>0xC9</code>, and the String constructor decodes that according to ISO-8859-1, resulting in the character <code>É</code>. The next line:</p> <pre><code>String valueUTF8 = new String(valueISO.getBytes(), "UTF-8"); </code></pre> <p>...does nothing. <code>getBytes()</code> encodes the string in the platform default encoding, which is UTF-8 on your Linux system. Then the String constructor decodes it with the same encoding. Delete that line and you should still get the same result.</p> <p>More to the point, your attempt to create a "UTF-8 string" was misguided. You don't need to concern yourself with the encoding of Java's strings--they're always UTF-16. When bringing text into a Java app, you just need to make sure you decode it with the correct encoding.</p> <p>And if my analysis is correct, your Access driver <em>is</em> decoding it correctly; the problem is at the other end, possibly before the DB even comes into the picture. <em>That's</em> what you need to fix, because that <code>new String(getBytes())</code> hack can't be counted on to work in all cases.</p> <hr> <p><strong>Original analysis, based on <em>no</em> information.</strong> :-/<br> If you're seeing <code>HANDICAP╔ES</code> on the console, there's probably no problem. Given this code:</p> <pre><code>System.out.println("HANDICAPÉES"); </code></pre> <p>The JVM converts the (Unicode) string to the platform default encoding, windows-1252, before sending it to the console. Then the console decodes that using its <em>own</em> default encoding, which happens to be cp850. So the console displays it wrong, but that's normal. If you want it to display correctly, you can change the console's encoding with this command:</p> <pre><code>CHCP 1252 </code></pre> <p>To display the string in a GUI element, such as a JLabel, you don't have to do anything special. Just make sure you use a font that can display all the characters, but that shouldn't be problem for French. </p> <p>As for writing to a file, just specify the desired encoding when you create the Writer:</p> <pre><code>OutputStreamWriter osw = new OutputStreamWriter( new FileOutputStream("myFile.txt"), "UTF-8"); </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload