Note that there are some explanatory texts on larger screens.

plurals
  1. POWhat is the best way to handle uploaded text files of different encodings?
    primarykey
    data
    text
    <p>Internally our PHP application uses UTF-8, and we do processing on .csv files and fixedwidth (text) files. We have written some nice libraries to work with these files (classes essentially). </p> <p>We recently added the ability for administrators to upload files of these types so they could be processed and quickly ran into issues across multiple OS's. What we soon realised is that the files being read in were of different encodings to our application (i.e Windows-1252 or ISO-8859).</p> <p>Since it is impossible to control what encoding of files are submitted to us my question is; what is the best way to handle uploaded text files of different encodings? I can think of two solutions currently:</p> <ul> <li>When a file is received, detect its encoding and convert it to UTF-8, then re-save it. The rest of the system then only needs to be UTF-8 aware and can ignore 'encoding' issues.</li> <li>Change the csv / fixed width library so they become encoding aware themselves</li> </ul> <p>I also thought about the pro's and con's of these too:</p> <ul> <li>Converting input makes the rest of the libraries smaller and reduces duplication, however it seems wasteful in terms of processing</li> <li>Make libraries internally aware - this seems to involve more code but might be more speedy</li> </ul> <p>Thoughts please?</p> <p>Edit: I am really interested to know where to apply, architecturally, character encoding/transforming should happen - is it at the point of input or during the use of the files?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload