Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>URIs originated in <a href="http://www.rfc-editor.org/rfc/rfc1630.txt" rel="noreferrer">RFC 1630</a>, with percent-encoding as a method to allow "unsafe" characters to be represented. This original version actually mentioned the ISO Latin 1 character set as the encoding for non-ASCII characters. <a href="http://www.rfc-editor.org/rfc/rfc1738.txt" rel="noreferrer">RFC 1738</a> later that year removed this reference to Latin-1 in defining URLs.</p> <p>The query string format is actually a <em>different</em> but related encoding, application/x-www-form-urlencoded, defined in <a href="http://www.rfc-editor.org/rfc/rfc1866.txt" rel="noreferrer">RFC 1866</a> along with HTML 2.0. It was based on <a href="http://www.rfc-editor.org/rfc/rfc1738.txt" rel="noreferrer">RFC 1738</a>, but specified that spaces (not all whitespace, just the character with ASCII code 0x20) are replaced by '+' and that line breaks are to be encoded as CRLF (i.e. <code>%0D%0A</code>). The former is likely because that saves 2 bytes for a very common character in form submissions at the expense of using an extra 2 bytes for a much less common character, and the latter is to avoid problems when transferring between systems using different end-of-line codings. Non-ASCII characters were left unconsidered.</p> <p>UTF-8 coding in URIs came over a decade later, in <a href="http://www.rfc-editor.org/rfc/rfc3986.txt" rel="noreferrer">RFC 3986</a>, although individual protocols may have specified this or another encoding of non-ASCII characters earlier. To maintain backwards compatibility, all UTF-8 octets must be percent-encoded. The companion <a href="http://www.rfc-editor.org/rfc/rfc3987.txt" rel="noreferrer">RFC 3987</a> defines "Internationalized Resource Identifiers" (IRIs) which are basically "URIs with most codepoints 160 and above allowed to appear unencoded", but many protocols still require URIs. Note that your statement above is incorrect, as a <b>U</b>RL may not contain an unencoded ü or any other non-ASCII character.</p> <p>application/x-www-form-urlencoded has been internationalized in a different manner. The <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#application/x-www-form-urlencoded-encoding-algorithm" rel="noreferrer">HTML5 specification of application/x-www-form-urlencoded</a> explicitly allows that any ASCII-compatible character set may be used for characters in the query string, and in fact different fields may use different character sets, but all non-ASCII octets must still be percent-encoded. When used in the query part of an IRI, it is possible that these characters <em>could</em> be represented unencoded if properly-normalized UTF-8 is being used as the character set, since conversion back to a URI would result in correct application/x-www-form-urlencoded data.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload