Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I did <em>exactly</em> the same thing, except that my fields were separated by tab <code>'\t'</code> and I had to also handle non-numeric comments on the end of each line and header rows interspersed with the data.</p> <p><a href="http://dl.dropbox.com/u/6919979/breaktdf%20instructions.pdf" rel="nofollow">Here</a> is the documentation for my utility.</p> <p>And I also had a speed problem. Here are the things I did to improve performance by around 20x:</p> <ul> <li>Replace explicit file reads with memory-mapped files. Map two blocks at once. When you are in the second block after processing a line, remap with the second and third blocks. This way a line that straddles a block boundary is still contiguous in memory. (Assumes that no line is larger than a block, you can probably increase blocksize to guarantee this.)</li> <li>Use SIMD instructions such as <code>_mm_cmpeq_epi8</code> to search for line endings or other separator characters. In my case, any line containing an <code>'='</code> character was a metadata row that needed different processing.</li> <li>Use a barebones number parsing function (I used a custom one for parsing times in HH:MM:SS format, <code>strtod</code> and <code>strtol</code> are perfect for grabbing ordinary numbers). These are much faster than <code>istream</code> formatted extraction functions.</li> <li>Use the OS file write API instead of the standard C++ API.</li> </ul> <p>If you dream of throughput in the 300,000 lines/second range, then you should consider a similar approach.</p> <p>Your executable also shrinks when you don't use C++ standard streams. I've got 205KB, including a graphical interface, and only dependent on DLLs that ship with Windows (no MSVCRTxx.dll needed). And looking again, I still am using C++ streams for status reporting.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload