Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>It is <strong>not</strong> UTF-16 encoding. It rather seems like bogus encoding, because the \uXXXX encoding is independant of whatever UTF or UCS encodings for Unicode. <code>\u00c2\u00a3</code> really maps to the <code>£</code> string.</p> <p>What you should have is <code>\u00a3</code> which is the unicode code point for <code>£</code>.</p> <p>{0xC2, 0xA3} is the UTF-8 encoded 2-byte character for this code point.</p> <p>If, as I think, the software that encoded the original UTF-8 string to JSON was oblivious to the fact it was UTF-8 and blindly encoded each byte to an escaped unicode code point, then you need to convert each pair of unicode code points to an UTF-8 encoded character, and then decode it to the native PHP encoding to make it printable.</p> <pre><code>function fixBadUnicode($str) { return utf8_decode(preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2"))', $str)); } </code></pre> <p>Example here: <a href="http://phpfiddle.org/main/code/6sq-rkn" rel="noreferrer">http://phpfiddle.org/main/code/6sq-rkn</a></p> <p><strong>Edit:</strong></p> <p>If you want to fix the string in order to obtain a valid JSON string, you need to use the following function:</p> <pre><code>function fixBadUnicodeForJson($str) { $str = preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2")).chr(hexdec("$3")).chr(hexdec("$4"))', $str); $str = preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2")).chr(hexdec("$3"))', $str); $str = preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2"))', $str); $str = preg_replace("/\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1"))', $str); return $str; } </code></pre> <p><strong>Edit 2:</strong> fixed the previous function to transform any wrongly unicode escaped utf-8 byte sequence into the equivalent utf-8 character.</p> <p>Be careful that some of these characters, which probably come from an editor such as Word are not translatable to ISO-8859-1, therefore will appear as '?' after ut8_decode.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload