Note that there are some explanatory texts on larger screens.

plurals
  1. POIDN aware tools to encode/decode human readable IRI to/from valid URI
    primarykey
    data
    text
    <p>Let's assume a user enter address of some resource and we need to translate it to: </p> <pre><code>&lt;a href="valid URI here"&gt;human readable form&lt;/a&gt; </code></pre> <p>HTML4 specification refers to <a href="http://www.faqs.org/rfcs/rfc3986.html" rel="nofollow noreferrer">RFC 3986</a> which allows only ASCII alphanumeric characters and dash in host part and all non-ASCII character in other parts should be percent-encoded. That's what I want to put in href attribute to make link working properly in all browsers. IDN should be encoded with <a href="http://en.wikipedia.org/wiki/Punycode" rel="nofollow noreferrer">Punycode</a>.</p> <p>HTML5 draft refers to <a href="http://www.faqs.org/rfcs/rfc3987.html" rel="nofollow noreferrer">RFC 3987</a> which also allows percent-encoded unicode characters in host part and a large subset of unicode in both host and other parts without encoding them. User may enter address in any of these forms. To provide human readable form of it I need to decode all printable characters. Note that some parts of address might not correspond to valid UTF-8 sequences, usually when target site uses some other character encoding.</p> <p>An example of what I'd like to get:</p> <pre><code>&lt;a href="http://xn--80aswg.xn--p1ai/%D0%BF%D1%83%D1%82%D1%8C?%D0%B7%D0%B0%D0%BF%D1%80%D0%BE%D1%81"&gt; http://сайт.рф/путь?запрос&lt;/a&gt; </code></pre> <p>Are there any tools to solve these tasks? I'm especially interested in libraries for Python and JavaScript.</p> <p><strong>Update</strong>: I know there is a way to do percent and Punycode (without proper normalization, but I can live with it) encoding/decoding in Python and JavaScript. The whole task needs much more work and there are some pitfalls (some characters should be always encoded or never encoded depending on context). I wonder if there are ready to use libraries for the <em>whole</em> problem, since it seems to be quite common and modern browsers already do such conversions (try typing <code>http://%D1%81%D0%B0%D0%B9%D1%82.%D1%80%D1%84/</code> in Google Chrome and it will be replaced with <code>http://сайт.рф/</code>, but use <code>Host: xn--80aswg.xn--p1ai</code> in HTTP request).</p> <p><strong>Update2</strong>: Vinay Sajip pointed that Werkzeug has iri_to_uri and uri_to_iri functions that handles most cases correctly. I've found only 2 cases where it fails so far: percent-encoded host (quite easy to fix) and invalid utf-8 sequences (it's a bit tricky to do nicely, but shouldn't be a problem).</p> <p>I'm still looking for library in JavaScript. It's not hard to write, but I'd prefer to avoid inventing the wheel.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload