StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Peculiar problem, but I think I can reproduce it by a suitably-unholy mix of UTF-8 and Latin-1 (not by just two uses of UTF-8 without an interspersed mis-step in Latin-1 though). Here's the whole weird round trip, "there and back again" (Python 2.* or IronPython should both be able to reproduce this):</p> <pre><code># -*- coding: utf-8 -*- uni = u'Újratárgyalja' enc1 = uni.encode('utf-8') enc2 = enc1.decode('latin-1').encode('utf-8') dec3 = enc2.decode('utf-8') dec4 = dec3.encode('latin-1').decode('utf-8') for x in (uni, enc1, enc2, dec3, dec4): print repr(x), x </code></pre> <p>This is the interesting output...:</p> <pre><code>u'\xdajrat\xe1rgyalja' Újratárgyalja '\xc3\x9ajrat\xc3\xa1rgyalja' Újratárgyalja '\xc3\x83\xc2\x9ajrat\xc3\x83\xc2\xa1rgyalja' ÃjratÃ¡rgyalja u'\xc3\x9ajrat\xc3\xa1rgyalja' ÃjratÃ¡rgyalja u'\xdajrat\xe1rgyalja' Újratárgyalja </code></pre> <p>The weird string starting with <code>Ã</code> appears as enc2, i.e. two utf-8 encodings WITH an interspersed latin-1 decoding thrown into the mix. And as you can see it can be undone by the exactly-converse sequence of operations: decode as utf-8, re-encode as latin-1, re-decode as utf-8 again -- and the original string is back (yay!).</p> <p>I believe that the normal round-trip properties of both Latin-1 (aka ISO-8859-1) and UTF-8 should guarantee that this sequence will work (sorry, no C# around to try in that language right now, but I would expect that the encoding/decoding sequences should not depend on the specific programming language in use).</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload