Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>First, <code>ISO-8859-1</code> isn't a valid coding declaration. You want <code>iso-8859-1</code>. If you look at <a href="http://docs.python.org/2/library/codecs.html" rel="nofollow">the docs</a>, you can call this <code>latin_1</code>, <code>iso-8859-1</code>, <code>iso8859-1</code>, <code>8859</code>, <code>cp819</code>, <code>latin</code>, <code>latin1</code>, or <code>L1</code>, but not <code>ISO-8859-1</code>.</p> <p>It looks like <code>codecs.lookup</code> bends over backward to accept bad input, including doing case-insensitive lookups. If you trace <a href="http://hg.python.org/cpython/file/2.7/Lib/codecs.py" rel="nofollow"><code>codecs.lookup</code></a> through <a href="http://hg.python.org/cpython/file/2.7/Modules/_codecsmodule.c" rel="nofollow"><code>_codecs.lookup</code></a> to <a href="http://hg.python.org/cpython/file/2.7/Python/codecs.c" rel="nofollow"><code>_PyCodec_Lookup</code></a>, you can see this comment:</p> <pre><code>/* Convert the encoding to a normalized Python string: all characters are converted to lower case, spaces and hyphens are replaced with underscores. */ </code></pre> <p>But source file decoding doesn't go through the same codec lookup process. Because it happens at compile time rather than runtime, there's no reason for it to do so. (At any rate, saying "It seems to work, even though the docs say it's wrong… so why doesn't it quite work right?" is kind of silly in the first place.)</p> <p>To demonstrate, if I create two Latin-1 files:</p> <p>badcode.py:</p> <pre><code># -*- coding: ISO-8859-1 -*- print u"Vérifier l'affichage de cette chaîne" </code></pre> <p>goodcode.py:</p> <pre><code># -*- coding: iso-8859-1 -*- print u"Vérifier l'affichage de cette chaîne" </code></pre> <p>The first one fails, the second succeeds.</p> <p>Now, why does it "work" when it's going to console but raise an exception when piped?</p> <p>Well, when you print to a Windows console, or a Unix TTY, Python has some code to try to guess the right encoding to use. (I'm not sure what happens under the covers on Windows; it might even be using UTF-16 output, for all I know.) When you're not printing to a console/TTY, it can't do this, so you have to specify the encoding explicitly.</p> <p>You can see some of what's going on by looking at <code>sys.stdout.isatty()</code>, <code>sys.stdout.encoding</code>, and <code>sys.getdefaultencoding()</code>. Here's what I see on a Mac in different cases:</p> <ul> <li>Python 2, no redirect: <code>True, UTF-8, ascii, Vérifier</code></li> <li>Python 3, no redirect: <code>True, UTF-8, utf-8, Vérifier</code></li> <li>Python 2, redirect: <code>False, None, ascii, UnicodeEncodeError</code></li> <li>Python 3, redirect: <code>False, UTF-8, utf-8, Vérifier</code></li> </ul> <p>If <code>isatty()</code>, <code>encoding</code> will be an appropriate encoding for the TTY; otherwise, <code>encoding</code> will be the default value, which is <code>None</code> (meaning <code>ascii</code>) in 2.x, and (I think—I'd have to check the code) something based on <code>getdefaultencoding()</code> in 3.x. Which means that if you try to print Unicode while <code>stdout</code> is not a TTY in 2.x, it will try to encode it as <code>ascii</code>, <code>strict</code>, which will fail if you've got non-ASCII characters.</p> <p>If you somehow know what codec you want to use, you can deal with this manually by checking <code>isatty()</code> and encoding to that codec (or even to <code>ascii</code>, <code>ignore</code> instead of <code>strict</code>, if you prefer) whenever you print, instead of trying to print Unicode. (If you know what codec you want, you may want to do this even in 3.x—defaulting to UTF-8 isn't too helpful if you're trying to generate, say, Windows-1252 files…)</p> <p>The difference there actually has nothing to do with Latin-1. Try this out:</p> <p>nocode.py:</p> <pre><code>print u"V\xe9rifier l'affichage de cette cha\xeene" print u"V\u00e9rifier l'affichage de cette cha\u00eene" </code></pre> <p>I get the Unicode strings encoded to UTF-8 for my Mac terminal, and (apparently) Windows-1252 to my Windows cmd window, but an exception redirecting to a file.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload