Note that there are some explanatory texts on larger screens.

plurals
  1. POPHP character encoding hell reading csv file with fgets
    text
    copied!<p>I have a web site that receives a CSV file by FTP once a month. For years it was an ASCII file. Now I'm receiving UTF-8 one month then UTF-16BE the next and UTF-16LE the month after that. Maybe I'll get UTF-32 next month. Fgets returns the byte order mark at the beginning of the UTF files. How can I get PHP to automatically recognize the character encoding? I had tried mb_detect_encoding and it returned ASCII regardless of the file type. I changed my code to read the BOM and explicitly put the character encoding into mb_convert_encoding. This worked until the latest file, which is UTF-16LE. In this file it reads the first line correctly and all subsequent lines show as question marks ("?"). What am I doing wrong?</p> <pre><code>$fhandle = fopen( $file_in, "r" ); if ( fhandle === false ) { echo "&lt;p class=redbold&gt;Error opening file $file_in.&lt;/p&gt;"; die(); } $i = 0; while( ( $line = fgets( $fhandle ) ) !== false ) { $i++; // Detect encoding on first line. Actual text always begins with string "Document" if ( $i == 1 ) { $line_start = substr( $line, 0, 4 ); $line_start_hex = bin2hex( $line_start ); $utf16_start = 'fffe4400'; $utf8_start = 'efbbbf44'; if ( strcmp( $line_start, 'Docu' ) == 0 ) { $char_encoding = 'ASCII'; } elseif ( strcmp( $line_start_hex, 'efbbbf44' ) == 0 ) { $char_encoding = 'UTF-8'; $line = substr( $line, 3 ); } elseif ( strcmp( $line_start_hex, 'fffe4400' ) == 0 ) { $char_encoding = 'UTF-16LE'; $line = substr( $line, 2 ); } elseif ( strcmp( $line_start_hex, 'feff4400' ) == 0 ) { $char_encoding = 'UTF-16BE'; $line = substr( $line, 2 ); } else { echo "&lt;p class=redbold&gt;Error, unknown character encoding. Line =&lt;br&gt;", $line_start_hex, '&lt;/p&gt;'; require( '../footer.php' ); die(); } echo "&lt;p&gt;char_encoding = $char_encoding&lt;/p&gt;"; } // Convert UTF if ( $char_encoding != 'ASCII' ) { $line = mb_convert_encoding( $line, 'ASCII', $char_encoding); } echo '&lt;p&gt;'; var_dump( $line ); echo '&lt;/p&gt;'; } </code></pre> <p>Output:</p> <pre><code> char_encoding = UTF-16LE string(101) "DocumentNumber,RecordedTS,Title,PageCount,City,TransTaxAccountCode,TotalTransferTax,Description,Name " string(83) "???????????????????????????????????????????????????????????????????????????????????" string(88) "????????????????????????????????????????????????????????????????????????????????????????" string(84) "????????????????????????????????????????????????????????????????????????????????????" string(80) "????????????????????????????????????????????????????????????????????????????????" </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload