Note that there are some explanatory texts on larger screens.

plurals
  1. POText editors show python created UTF-8 files as gibberish
    primarykey
    data
    text
    <p>this is my first question here and if its format is not what is expected here, sorry in advance.</p> <p>I have a small utility that reads ISO-8859-9 text files and produces its UTF-8 copies. The method I found is the usage of encode and decode methods, when I implement the way of the elders, text editors show the unicode characters as irrelevant characters. </p> <p>The twist of the problem is the files are written correctly. For check, I've created a hand-created version of the same file in TextEdit in Mac. The converted version's hex dump and md5sum is same for the hand-created one. However both Textedit and Kwrite (or Kate) on KDE shows absurd characters even if I choose UTF-8 as the input encoding. Why this is happening and how can I solve this?</p> <p>Thanks a lot. </p> <p>Update:</p> <p>od -c outputs are below:</p> <p>First of all, the ISO-8859-9 file:</p> <pre><code>0000000 374 360 i 376 347 366 334 320 335 336 307 326 T e s t 0000020 T e s t 0000024 </code></pre> <p>The Python Created UTF-8:</p> <pre><code>0000000 ü ** ğ ** i ş ** ç ** ö ** Ü ** Ğ ** İ 0000020 ** Ş ** Ç ** Ö ** T e s t T e s t 0000037 </code></pre> <p>Hand Created UTF-8:</p> <pre><code>0000000 ü ** ğ ** i ş ** ç ** ö ** Ü ** Ğ ** İ 0000020 ** Ş ** Ç ** Ö ** T e s t T e s t 0000037 </code></pre> <p>The Actual Code:</p> <pre><code>def convert_file(path_of_text_file): try: original_file = open(path_of_text_file, 'rb') file_contents = unicode(original_file.read(), 'iso-8859-9') original_file.close() new_file = open("untitled2.txt", 'w+b') new_file.write(file_contents.encode('utf8')) new_file.close() except IOError: pass </code></pre> <p>Also yes, the handcrafted file open just fine. Also it has the same md5sum and hex output of the python generated one.</p> <p>od -xc outputs:</p> <p>Again the original ISO-8859-9 file:</p> <pre><code>0000000 f0fc fe69 f6e7 d0dc dedd d6c7 6554 7473 374 360 i 376 347 366 334 320 335 336 307 326 T e s t 0000020 6554 7473 T e s t 0000024 </code></pre> <p>Python generated UTF-8 file:</p> <pre><code>0000000 bcc3 9fc4 c569 c39f c3a7 c3b6 c49c c49e ü ** ğ ** i ş ** ç ** ö ** Ü ** Ğ ** İ 0000020 c5b0 c39e c387 5496 7365 5474 7365 0074 ** Ş ** Ç ** Ö ** T e s t T e s t 0000037 </code></pre> <p>Hand crafted UTF-8 file:</p> <pre><code>0000000 bcc3 9fc4 c569 c39f c3a7 c3b6 c49c c49e ü ** ğ ** i ş ** ç ** ö ** Ü ** Ğ ** İ 0000020 c5b0 c39e c387 5496 7365 5474 7365 0074 ** Ş ** Ç ** Ö ** T e s t T e s t 0000037 </code></pre> <p>Another note of interest: BBEdit handles python created files just fine.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload