Note that there are some explanatory texts on larger screens.

plurals
  1. POPython encoding conversion
    primarykey
    data
    text
    <p>I wrote a Python script that processes CSV files with non-ascii characters, encoded in UTF-8. However the encoding of the output is broken. So, from this in the input:</p> <pre><code>"d\xc4\x9bjin hornictv\xc3\xad" </code></pre> <p>I get this in the output:</p> <pre><code>"d\xe2\x99\xafjin hornictv\xc2\xa9\xc6\xaf" </code></pre> <p>Can you suggest where the encoding error might come from? Have you seen similar behaviour previously?</p> <p>EDIT: I'm using <code>csv</code> standard library with the <code>UnicodeWriter</code> class featured in the <a href="http://docs.python.org/library/csv.html" rel="nofollow">docs</a>. I use Python version 2.6.6.</p> <p>EDIT 2: The code to reproduce the behaviour:</p> <pre><code>#!/usr/bin/env python #-*- coding:utf-8 -*- import csv from pymarc import MARCReader # The pymarc package available PyPI: http://pypi.python.org/pypi/pymarc/2.71 from UnicodeWriter import UnicodeWriter # The UnicodeWriter from: http://docs.python.org/library/csv.html def getRow(tag, record): if record[tag].is_control_field(): row = [tag, record[tag].value()] else: row = [tag] + record[tag].subfields return row inputFile = open("input.mrc", "r") outputFile = open("output.csv", "wb") reader = MARCReader(inputFile, to_unicode = True) writer = UnicodeWriter(outputFile, delimiter = ",", quoting = csv.QUOTE_MINIMAL) for record in reader: if bool(record["001"]): tags = [field.tag for field in record.get_fields()] tags.sort() for tag in tags: writer.writerow(getRow(tag, record)) inputFile.close() outputFile.close() </code></pre> <p>The input data is <a href="http://dl.dropbox.com/u/893551/input.mrc" rel="nofollow">available here</a> (large file).</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload