Note that there are some explanatory texts on larger screens.

plurals
  1. POWrite unicode content and unicode file name in Windows
    primarykey
    data
    text
    <pre><code>#source file is encoded in utf8 import urllib2 import re req = urllib2.urlopen('http://people.w3.org/rishida/scripts/samples/hungarian.html') c = req.read()#.decode('utf-8') p = r'title="This is Latin script \(Hungarian language\)"&gt;(.+)' text = re.search(p, c).group(1) name = text[:10]+'.txt' #file name will have special chars in it f = open(name, 'wb') f.write(text) #content of file will have special chars in it f.close() x = raw_input('done') </code></pre> <p>As you can see the script does a couple things: - Reads content that is known to have unicode characters from a webpage into a variable</p> <p>(The source file is saved in utf-8 but this should not make a difference unless unicode strings are actually being defined in the source code... As you can see the unicode string is being defined dynamially into a variable.. what encoding the source is shouldn't matter in this scenario)</p> <ul> <li>Writes a file with a name containing unicode characters</li> <li>Write unicode content into this file as well</li> </ul> <p>Here's the weird behavior I get (Windows 7, Python 2.7) : When I don't use the decode function:</p> <pre><code>c = req.read() </code></pre> <p>The NAME of the file will come out gibberish, but the CONTENT of the file will come out readable (that is you can see the correct unicode hungarian characters)</p> <p>Yet, when I USE the decode function:</p> <pre><code>c = req.read().decode('utf-8') </code></pre> <p>It will NOT ERROR on opening the file (really creating it with 'w' mode) and the resulting file's NAME will be readable, yep now it shows the correct unicode characters.</p> <p>So far so good right? Well, then it WILL ERROR on trying to write the unicode content to the file:</p> <pre><code> f.write(text) #content of file will have special chars in it UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128) </code></pre> <p>You see, I can't seem to have the cake and eat it too... Either I can correctly write the NAME of the file or I can correctly write the CONTENT of the file..</p> <p><strong>How can I do both?</strong></p> <p>I've also tried writing the file with</p> <pre><code>f = codecs.open(name, encoding='utf-8', mode='wb') </code></pre> <p>But it also errors..</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload