Note that there are some explanatory texts on larger screens.

plurals
  1. POParsing a CSV file using different encodings and libraries
    text
    copied!<p>Despite of numerous SO threads on the topic, I'm having trouble with parsing CSV. It's a .csv file downloaded from the Adwords Keyword Planner. Previously, Adwords had the option of exporting data as 'plain CSV' (which could be parsed with the Ruby CSV library), now the options are either Adwords CSV or Excel CSV. BOTH of these formats cause this problem (illustrated by a terminal session):</p> <pre><code>file = File.open('public/uploads/testfile.csv') =&gt; #&lt;File:public/uploads/testfile.csv&gt; file.read.encoding =&gt; #&lt;Encoding:UTF-8&gt; require 'csv' =&gt; true CSV.foreach(file) { |row| puts row } ArgumentError: invalid byte sequence in UTF-8 </code></pre> <p>Let's change the encoding and see if that helps:</p> <pre><code>file.close =&gt; nil file = File.open("public/uploads/testfile.csv", "r:ISO-8859-1") =&gt; #&lt;File:public/uploads/testfile.csv&gt; file.read.encoding =&gt; #&lt;Encoding:ISO-8859-1&gt; CSV.foreach(file) { |row| puts row } ArgumentError: invalid byte sequence in UTF-8 </code></pre> <p>Let's try using a different CSV library:</p> <pre><code>require 'smarter_csv' =&gt; true file.close =&gt; nil file = SmarterCSV.process('public/uploads/testfile.csv') ArgumentError: invalid byte sequence in UTF-8 </code></pre> <p>Is this a no-win situation? Do I have to roll my own CSV parser?</p> <p>I'm using Ruby 1.9.3p374. Thanks!</p> <p><strong>UPDATE 1:</strong></p> <p>Using the suggestions in the comments, here's the current version:</p> <pre><code>file_contents = File.open("public/uploads/new-format/testfile-adwords.csv", 'rb').read require 'iconv' unless String.method_defined?(:encode) if String.method_defined?(:encode) file_contents.encode!('UTF-16', 'UTF-8', :invalid =&gt; :replace, :replace =&gt; '') file_contents.encode!('UTF-8', 'UTF-16') else ic = Iconv.new('UTF-8', 'UTF-8//IGNORE') file_contents = ic.iconv(file_contents) end file_contents.gsub!(/\0/, '') #needed because otherwise, I get "string contains null byte (ArgumentError)" CSV.foreach(file_contents, :headers =&gt; true, :header_converters =&gt; :symbol) do |row| puts row end </code></pre> <p>This doesn't work - now I get a "file name too long" error.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload