Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You need to unescape HTML entities, and URL-unquote.<br> The standard library has <a href="http://docs.python.org/library/htmlparser.html?highlight=htmlparser#HTMLParser" rel="nofollow"><code>HTMLParser</code></a> and <a href="http://docs.python.org/library/urllib2.html?highlight=urllib2#urllib2" rel="nofollow"><code>urllib2</code></a> to help with those tasks.</p> <pre><code>import HTMLParser, urllib2 markup = '''&lt;a href="mailto:lad%20at%20maestro%20dot%20com"&gt; &lt;em&gt;ada&amp;#x40;graphics.maestro.com&lt;/em&gt; &lt;em&gt;mel&amp;#x40;graphics.maestro.com&lt;/em&gt;''' result = HTMLParser.HTMLParser().unescape(urllib2.unquote(markup)) for line in result.split("\n"): print(line) </code></pre> <p>Result:</p> <pre><code>&lt;a href="mailto:lad at maestro dot com"&gt; &lt;em&gt;ada@graphics.maestro.com&lt;/em&gt; &lt;em&gt;mel@graphics.maestro.com&lt;/em&gt; </code></pre> <hr> <p>Edit:<br> If your pages can contain non-ASCII characters, you'll need to take care to decode on input and encode on output.<br> The sample file you uploaded has charset set to <code>cp-1252</code>, so let's try decoding from that to Unicode:</p> <pre><code>import codecs with codecs.open(filename, encoding="cp1252") as fin: decoded = fin.read() result = HTMLParser.HTMLParser().unescape(urllib2.unquote(decoded)) with codecs.open('/output/file.html', 'w', encoding='cp1252') as fou: fou.write(result) </code></pre> <hr> <p>Edit2:<br> If you don't care about the non-ASCII characters you can simplify a bit:</p> <pre><code>with open(filename) as fin: decoded = fin.read().decode('ascii','ignore') ... </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload