Note that there are some explanatory texts on larger screens.

plurals
  1. POPython 2.7 reading and writing "éèàçê" from utf-8 file
    primarykey
    data
    text
    <p>I made this script which removes every trailing whitespace characters and replace all bad french characters by the right ones.</p> <p>Removing the trailing whitespace characters works but not the part about replacing the french characters.</p> <p>The file to read/write are encoded in UTF-8 so I added the utf-8 declaration above my script but in the end every bad characters (like \u00e9) are being replaced by litte square.</p> <p>Any idea why?</p> <p>script :</p> <pre><code># --*-- encoding: utf-8 --*-- import fileinput import sys CRLF = "\r\n" ACCENT_AIGU = "\\u00e9" ACCENT_GRAVE = "\\u00e8" C_CEDILLE = "\\u00e7" A_ACCENTUE = "\\u00e0" E_CIRCONFLEXE = "\\u00ea" CURRENT_ENCODING = "utf-8" #Getting filepath print "Veuillez entrer le chemin du fichier (utiliser des \\ ou /, c'est pareil) :" path = str(raw_input()) path.replace("\\", "/") #removing trailing whitespace characters for line in fileinput.FileInput(path, inplace=1): if line != CRLF: line = line.rstrip() print line print &gt;&gt;sys.stderr, line else: print CRLF print &gt;&gt;sys.stderr, CRLF fileinput.close() #Replacing bad wharacters for line in fileinput.FileInput(path, inplace=1): line = line.decode(CURRENT_ENCODING) line = line.replace(ACCENT_AIGU, "é") line = line.replace(ACCENT_GRAVE, "è") line = line.replace(A_ACCENTUE, "à") line = line.replace(E_CIRCONFLEXE, "ê") line = line.replace(C_CEDILLE, "ç") line.encode(CURRENT_ENCODING) sys.stdout.write(line) #avoid CRLF added by print print &gt;&gt;sys.stderr, line fileinput.close() </code></pre> <h1>EDIT</h1> <p>the input file contains this type of text :</p> <pre><code> * Cette m\u00e9thode permet d'appeller le service du module de tourn\u00e9e * &lt;code&gt;rechercherTechnicien&lt;/code&gt; et retourne la liste repr\u00e9sentant le num\u00e9ro * de la tourn\u00e9e ainsi que le nom et le pr\u00e9nom du technicien et la dur\u00e9e * th\u00e9orique por se rendre au point d'intervention. * </code></pre> <h1>EDIT2</h1> <p>Final code if someone is interested, the first part replaces the badly encoded caracters, the second part removes all right trailing whitespaces caracters.</p> <pre><code># --*-- encoding: iso-8859-1 --*-- import fileinput import re CRLF = "\r\n" print "Veuillez entrer le chemin du fichier (utiliser des \\ ou /, c'est pareil) :" path = str(raw_input()) path = path.replace("\\", "/") def unicodize(seg): if re.match(r'\\u[0-9a-f]{4}', seg): return seg.decode('unicode-escape') return seg.decode('utf-8') print "Replacing caracter badly encoded" with open(path,"r") as f: content = f.read() replaced = (unicodize(seg) for seg in re.split(r'(\\u[0-9a-f]{4})',content)) with open(path, "w") as o: o.write(''.join(replaced).encode("utf-8")) print "Removing trailing whitespaces caracters" for line in fileinput.FileInput(path, inplace=1): if line != CRLF: line = line.rstrip() print line else: print CRLF fileinput.close() print "Done!" </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload