Note that there are some explanatory texts on larger screens.

plurals
  1. POeffeciently read large spreadsheet file in C++
    primarykey
    data
    text
    <p>I normally use the method described in <a href="https://stackoverflow.com/questions/1120140/csv-parser-in-c/1120224#1120224">csv parser</a> to read spreadsheet files. However, when reading a 64MB file which has around 40 columns and 250K rows of data, it takes about 4 minutes. In the original method, a CSVRow class is used to read the file row by row, and a private vector is used to store all the data in a row. </p> <p>Several things to note:</p> <ul> <li>I did reserve enough capacity of the vector but not much helpful.</li> <li>I also need to create instances of some class when reading each line, but even when the code just read in the data without creating any instances, it takes long time.</li> <li>The file is tab-delimited instead of comma-delimited, but I don't think it matters.</li> </ul> <p>Since some columns in that file are not useful data, I changed the method to have a private string member to store all the data and then find the position of the (n-1)th and the nth delimiter to get the useful data (of course there are many useful columns). By doing so, I avoid some push_back operations, and cut the time to a little more than 2 minutes. However, that still seems too long to me.</p> <p>Here are my questions:</p> <ol> <li><p>Is there a way to read such a spreadsheet file more efficiently?</p></li> <li><p>Shall I read the file by buffer instead of line by line? If so, how to read by buffer and use the csvrow class?</p></li> <li>I haven't tried boost tokenizer, is that more efficient?</li> </ol> <p>Thank you for your help!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload