Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The SAX parser in Python 2.6 should be able to parse utf-8 without mangling it. Although you've left out the ContentHandler you're using with the parser, if that content handler attempts to print any non-ascii characters to your console, that will cause a crash.</p> <p>For example, say I have this XML doc:</p> <pre><code>&lt;?xml version="1.0" encoding="utf-8"?&gt; &lt;test&gt; &lt;name&gt;Champs-Élysées&lt;/name&gt; &lt;/test&gt; </code></pre> <p>And this parsing apparatus:</p> <pre><code>import xml.sax class MyHandler(xml.sax.handler.ContentHandler): def startElement(self, name, attrs): print "StartElement: %s" % name def endElement(self, name): print "EndElement: %s" % name def characters(self, ch): #print "Characters: '%s'" % ch pass parser = xml.sax.make_parser() parser.setContentHandler(MyHandler()) for line in open('text.xml', 'r'): parser.feed(line) </code></pre> <p>This will parse just fine, and the content will indeed preserve the accented characters in the XML. The only issue is that line in <code>def characters()</code> that I've commented out. Running in the console in Python 2.6, this will produce the exception you're seeing because the print function must convert the characters to ascii for output.</p> <p>You have 3 possible solutions:</p> <p><strong>One</strong>: Make sure your terminal supports unicode, then create a <code>sitecustomize.py</code> entry in your <code>site-packages</code> and set the default character set to utf-8:</p> <p>import sys sys.setdefaultencoding('utf-8')</p> <p><strong>Two</strong>: Don't print the output to the terminal (tongue-in-cheek)</p> <p><strong>Three</strong>: Normalize the output using <code>unicodedata.normalize</code> to convert non-ascii chars to ascii equivalents, or <code>encode</code> the chars to ascii for text output: <code>ch.encode('ascii', 'replace')</code>. Of course, using this method you won't be able to properly evaluate the text.</p> <p>Using option one above, your code worked just fine for my in Python 2.5.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload