StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Firstly, a clarification: your approach doesn't convert from <code>GB2312</code> to ASCII - and nor would you want it to, since ASCII can't represent the string <code>'╠µ╔Ý╩²¥¦┤ª└Ý│╠ð‗'</code>. What <code>decode</code> returns is a sequence of abstract characters that can't be directly represented on disk - the encoding is a serialisation rule. This type is called <code>unicode</code> in Python 2 and <code>str</code> in Python 3; the type of <code>stdout</code> will be <code>str</code> in Python 2, and <code>bytes</code> in Python 3.</p> <p>Passing raw bytes into <code>json.loads</code> tries to deserialise (decode) the input into a character string using utf-8. This gives the error you see since your input is serialised using a different, incompatible, encoding. Decoding it yourself first is the right approach - and in newer versions of Python, <code>json.loads</code> requires you to do this anyway (it strictly wants a character sequence rather than a byte sequence).</p> <p>There is one caveat: guessing the encoding, the way chardet does, is <em>hard</em>, and potentially error prone. It happens to work in this particular case, but you have no guarantee that it will work if you need to do something similar with other files. It <em>may</em> be the best approach available to you - usually, you would expect to see the encoding mentioned early in the file's metadata, but it doesn't seem to be in this case. But you should always try to find some authoritative information on it before resorting to guesswork.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload