Note that there are some explanatory texts on larger screens.

plurals
  1. POhow to write utf-8 string to utf-8 file on utf-8 machine with python
    primarykey
    data
    text
    <p>Not the first time, by this confused me:</p> <p>Open the file with <code>codecs.open</code>:</p> <pre><code>cfh = codecs.open('/tmp/ddfh', 'wb', 'utf-8') </code></pre> <p>Try to write the string, sa:</p> <pre><code>In [109]: sa Out[109]: '\xe6\x96\xb0 \xe9\x97\xbb\xe3\x80\x80\xe7\xbd\x91 \xe9\xa1\xb5\xe3\x80\x80\xe8\xb4\xb4 \xe5\x90\xa7\xe3\x80\x80\xe7\x9f\xa5 \xe9\x81\x93\xe3\x80\x80\xe9\x9f\xb3 \xe4\xb9\x90\xe3\x80\x80\xe5\x9b\xbe \xe7\x89\x87\xe3\x80\x80\xe8\xa7\x86 \xe9\xa2\x91\xe3\x80\x80\xe5\x9c\xb0 \xe5\x9b\xbe' In [110]: print sa 新 闻 网 页 贴 吧 知 道 音 乐 图 片 视 频 地 图 In [111]: sa.encode() --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) /home/za/tmp/&lt;ipython-input-111-dea686030e89&gt; in &lt;module&gt;() ----&gt; 1 sa.encode() UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128) In [112]: sa.decode() --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) /home/za/tmp/&lt;ipython-input-112-a79b22010b0e&gt; in &lt;module&gt;() ----&gt; 1 sa.decode() UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128) In [113]: sa.encode('utf-8') --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) /home/za/tmp/&lt;ipython-input-113-ed97f8f61eb5&gt; in &lt;module&gt;() ----&gt; 1 sa.encode('utf-8') UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128) In [114]: sa.decode('utf-8') Out[114]: u'\u65b0 \u95fb\u3000\u7f51 \u9875\u3000\u8d34 \u5427\u3000\u77e5 \u9053\u3000\u97f3 \u4e50\u3000\u56fe \u7247\u3000\u89c6 \u9891\u3000\u5730 \u56fe' In [115]: cfh.write(sa.decode('utf-8')) </code></pre> <p>It works in the above, but <strong>FAILED</strong> with another machine, same Ubuntu machine, same $LANG env. I keep hitting "'ascii' codec can't ...."</p> <p><strong>Who can point me to a good doc?</strong> the official doc about module <code>codecs</code> is not good for me. </p> <p>===</p> <p>The problem comes from the codes:</p> <pre><code># encoding=utf-8 # ...... def write_video_info_file(folder, filename, infos): # infos : a list of list, lines of text grouped by topic, results of language translations. absfn = os.path.join(folder, filename) with codecs.open(absfn, mode='wb', encoding='utf-8') as fh: for vinfo in infos: for v in vinfo: fh.write(v) fh.write("\n\n" + vi_delimit + "\n\n") </code></pre> <p>This was tested OK in my local machine, and deployed to a remote machine, then it get a lot: <code>UnicodeDecodeError: 'ascii' codec can't</code>.</p> <p>After it, nearly all <code>mode=</code>, open without codecs tried.</p> <pre><code>$ echo $LANG # en_US.UTF-8 </code></pre> <p>Python 2.7.3</p> <p>Ubuntu 12.04</p> <p>LANG=en_US.UTF-8</p> <p>LANGUAGE=</p> <p>LC_ALL=</p> <p>===</p> <p>I got the solution, use this to make sure all string are utf-8:</p> <pre><code>if isinstance(mystring, str): mystring = mystring.decode('utf-8') </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload