Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The character ç is encoded in the URL as %E7. This is how ISO-8859-1 encodes ç. The ISO-8859-1 character set represents a character with a single byte. The byte which represents ç can be expressed in hex as E7.</p> <p>In Unicode, ç has a code point of U+00E7. Unlike ISO-8859-1, in which the code point (E7) is the same as it's encoding (E7 in hex), Unicode has multiple encoding schemes such as UTF-8, UTF-16 and UTF-32. UTF-8 encodes U+00E7 (ç) as two bytes - C3 A7.</p> <p>See <a href="http://www.fileformat.info/info/unicode/char/00e7/index.htm" rel="noreferrer">here</a> for other ways to encode ç. </p> <p>As to why U+00E7 and E7 in ISO-8859-1 both use "E7", the first 256 code points in Unicode were made identical to <a href="http://en.wikipedia.org/wiki/Unicode" rel="noreferrer">ISO-8859-1</a>.</p> <p>If this URL were UTF-8, ç would be encoded as %C3%A7. My (very limited) understanding of <a href="http://tools.ietf.org/html/rfc2616#page-220" rel="noreferrer">RFC2616</a> is that the default encoding for a URL is (currently) ISO-8859-1. Therefore, this is most likely ISO-8859-1 encoded URL. Which means, the best approach is probably to check that the encoding is valid and if not, assume it is ISO-8859-1 and transcode it to UTF-8:</p> <pre><code>unless query.valid_encoding? query.encode!("UTF-8", "ISO-8859-1", :invalid =&gt; :replace, :undef =&gt; :replace, :replace =&gt; "") end </code></pre> <p>Here's the process in IRB (plus an escaping at the end for fun)</p> <pre><code>a = CGI.unescape("%E7") =&gt; "\xE7" a.encoding =&gt; #&lt;Encoding:UTF-8&gt; a.valid_encoding? =&gt; false b = a.encode("UTF-8", "ISO-8859-1") # From ISO-8859-1 -&gt; UTF-8 =&gt; "ç" b.encoding =&gt; #&lt;Encoding:UTF-8&gt; CGI.escape(b) =&gt; "%C3%A7" </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload