StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POUnicodeEncodeErrors while using DictWriter for utf-8
text
Body
copied!<p>I am trying to write a dictionary containing utf-8 strings to a CSV. I'm following the instructions from <a href="https://stackoverflow.com/questions/5838605/python-dictwriter-writing-utf-8-encoded-csv-files">here</a>. However, despite meticulously encoding and decoding these utf-8 strings, I am getting a UnicodeEncodeErrors involving 'ascii' sets.</p> <p>I have a list of dictionaries which contain strings and ints as values related to changes to Wikipedia articles. The list below corresponds to <a href="http://en.wikipedia.org/w/index.php?diff=121862749&oldid=prev" rel="nofollow noreferrer">this change</a>, for example:</p> <pre><code>edgelist = [{'articleName': 'Barack Obama', 'editorName': 'Schonbrunn', 'revID': '121844749', 'bytesAdded': '183'}, {'articleName': 'Barack Obama', 'editorName': 'Eep\xc2\xb2', 'revID': '121862749', 'bytesAdded': '107'}] </code></pre> <p>The problem is <code>list[1]['editorName']</code>. It has type <code>'str'</code> and <code>el[1]['editorName'].decode('utf-8')</code> is <code>u'Eep\xb2'</code></p> <p>The code I am attempting is:</p> <pre><code>_ENCODING = 'utf-8' def dictToCSV(edgelist,output_file): with codecs.open(output_file,'wb',encoding=_ENCODING) as f: w = csv.DictWriter(f,sorted(edgelist[0].keys())) w.writeheader() for d in edgelist: for k,v in d.items(): if type(v) == int: d[k]=str(v).encode(_ENCODING) w.writerow({k:v.decode(_ENCODING) for k,v in d.items()}) </code></pre> <p>This returns:</p> <pre><code>dictToCSV(edgelist,'test2.csv') File "csv_to_charts.py", line 129, in dictToCSV w.writerow({k:v.decode(_ENCODING,'ignore') for k,v in d.items()}) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 148, in writerow return self.writer.writerow(self._dict_to_list(rowdict)) UnicodeEncodeError: 'ascii' codec can't encode character u'\xb2' in position 3: ordinal not in range(128) </code></pre> <p>Other permutations such as swapping decode for encode or nothing in the final problematic line also return errors:</p> <ol> <li><code>w.writerow({k:v.encode(_ENCODING) for k,v in d.items()})</code> returns <code>'UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 56: ordinal not in range(128)</code></li> <li><code>w.writerow({k:v for k,v in d.items()})</code> returns <code>UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 56: ordinal not in range(128)</code></li> <li>Following <a href="https://stackoverflow.com/questions/3285578/csv-dictwriter-unicode-and-utf-8">this</a>, I changed <code>with codecs.open(output_file,'wb',encoding=_ENCODING) as f:</code> to <code>with open(output_file,'wb') as f:</code> and still receive the same error.</li> </ol> <p>Excluding the list element(s) or the keys containing this problematic string, the script works fine otherwise.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload