Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The main source of problems I've had working with unicode strings is when you mix utf-8 encoded strings with unicode ones.</p> <p>For example, consider the following scripts.</p> <p>two.py</p> <pre><code># encoding: utf-8 name = 'helló wörld from two' </code></pre> <p>one.py</p> <pre><code># encoding: utf-8 from __future__ import unicode_literals import two name = 'helló wörld from one' print name + two.name </code></pre> <p>The output of running <code>python one.py</code> is:</p> <pre><code>Traceback (most recent call last): File "one.py", line 5, in &lt;module&gt; print name + two.name UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128) </code></pre> <p>In this example, <code>two.name</code> is an utf-8 encoded string (not unicode) since it did not import <code>unicode_literals</code>, and <code>one.name</code> is an unicode string. When you mix both, python tries to decode the encoded string (assuming it's ascii) and convert it to unicode and fails. It would work if you did <code>print name + two.name.decode('utf-8')</code>.</p> <p>The same thing can happen if you encode a string and try to mix them later. For example, this works:</p> <pre><code># encoding: utf-8 html = '&lt;html&gt;&lt;body&gt;helló wörld&lt;/body&gt;&lt;/html&gt;' if isinstance(html, unicode): html = html.encode('utf-8') print 'DEBUG: %s' % html </code></pre> <p>Output:</p> <pre><code>DEBUG: &lt;html&gt;&lt;body&gt;helló wörld&lt;/body&gt;&lt;/html&gt; </code></pre> <p>But after adding the <code>import unicode_literals</code> it does NOT:</p> <pre><code># encoding: utf-8 from __future__ import unicode_literals html = '&lt;html&gt;&lt;body&gt;helló wörld&lt;/body&gt;&lt;/html&gt;' if isinstance(html, unicode): html = html.encode('utf-8') print 'DEBUG: %s' % html </code></pre> <p>Output:</p> <pre><code>Traceback (most recent call last): File "test.py", line 6, in &lt;module&gt; print 'DEBUG: %s' % html UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128) </code></pre> <p>It fails because <code>'DEBUG: %s'</code> is an unicode string and therefore python tries to decode <code>html</code>. A couple of ways to fix the print are either doing <code>print str('DEBUG: %s') % html</code> or <code>print 'DEBUG: %s' % html.decode('utf-8')</code>.</p> <p>I hope this helps you understand the potential gotchas when using unicode strings.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload