Note that there are some explanatory texts on larger screens.

plurals
  1. PORead large amount of ASCII numbers and write in binary form
    primarykey
    data
    text
    <p>I have data files with about 1.5 Gb worth of floating-point numbers stored as ASCII text separated by whitespace, e.g., <code>1.2334 2.3456 3.4567</code> and so on.</p> <p>Before processing such numbers I first translate the original file to binary format. This is helpful because I can choose whether to use <code>float</code> or <code>double</code>, reduce file size (to about 800 MB for <code>double</code> and 400 MB for <code>float</code>), and read in chunks of the appropriate size once I am processing the data.</p> <p>I wrote the following function to make the ASCII-to-binary translation:</p> <pre><code>template&lt;typename RealType=float&gt; void ascii_to_binary(const std::string&amp; fsrc, const std::string&amp; fdst){ RealType value; std::fstream src(fsrc.c_str(), std::fstream::in | std::fstream::binary); std::fstream dst(fdst.c_str(), std::fstream::out | std::fstream::binary); while(src &gt;&gt; value){ dst.write((char*)&amp;value, sizeof(RealType)); } // RAII closes both files } </code></pre> <p>I would like to speed-up <code>acii_to_binary</code>, and I seem unable to come up with anything. I tried reading the file in chunks of 8192 bytes, and then try to process the buffer in another subroutine. This seems very complicated because the last few characters in the buffer may be whitespace (in which case all is good), or a truncated number (which is very bad) - the logic to handle the possible truncation seems hardly worth it.</p> <p>What would you do to speed up this function? I would rather rely on standard C++ (C++11 is OK) with no additional dependencies, like boost.</p> <p>Thank you.</p> <h1>Edit:</h1> <p>@DavidSchwarts:</p> <p>I tried to implement your suggestion as follows:</p> <pre><code> template&lt;typename RealType=float&gt; void ascii_to_binary(const std::string&amp; fsrc, const std::string&amp; fdst{ std::vector&lt;RealType&gt; buffer; typedef typename std::vector&lt;RealType&gt;::iterator VectorIterator; buffer.reserve(65536); std::fstream src(fsrc, std::fstream::in | std::fstream::binary); std::fstream dst(fdst, std::fstream::out | std::fstream::binary); while(true){ size_t k = 0; while(k&lt;65536 &amp;&amp; src &gt;&gt; buffer[k]) k++; dst.write((char*)&amp;buffer[0], buffer.size()); if(k&lt;65536){ break; } } } </code></pre> <p>But it does not seem to be writing the data! I'm working on it...</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload