Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>As other mentionned, you can process in steps.<br> I would like to explain this with my own words (differs on point 3) :</p> <ol> <li><p>Read the file sequentially, process N records at a time in memory (N is arbitrary, depending on your memory constraint and the number T of temporary files that you want).</p></li> <li><p>Sort the N records in memory, write them to a temp file. Loop on T until you are done.</p></li> <li><p><strong>Open all the T temp files at the same time, but read only one record per file.</strong> (Of course, with buffers). For each of these T records, find the smaller, write it to the final file, and advance only in that file.</p></li> </ol> <hr> <p>Advantages:</p> <ul> <li>The <strong>memory</strong> consumption is as low as you want.</li> <li>You only do the <strong>double of disk accesses</strong> comparing to a everything-in-memory policy. Not bad! :-)</li> </ul> <hr> <p>Exemple with numbers:</p> <ol> <li>Original file with 1 million records.</li> <li>Choose to have 100 temp files, so read and sort 10 000 records at a time, and drop these in their own temp file.</li> <li>Open the 100 temp file at a time, read the first record in memory.</li> <li>Compare the first records, write the smaller and advance this temp file.</li> <li>Loop on step 5, one million times.</li> </ol> <hr> <p><strong>EDITED</strong></p> <p>You mentionned a multi-threaded application, so I wonder ...</p> <p>As we seen from these discussions on this need, using less memory gives less performance, with a dramatic factor in this case. So I could also suggest to <strong>use only one thread</strong> to process only one sort at a time, not as a multi-threaded application.</p> <p>If you process ten threads, each with a tenth of the memory available, your performance will be miserable, much much less than a tenth of the initial time. If you use only one thread, and queue the 9 other demands and process them in turn, you global performance will be much better, you will finish the ten tasks much faster.</p> <hr> <p>After reading this response : <a href="https://stackoverflow.com/questions/2087469/sort-a-file-with-huge-volume-of-data-given-memory-constraint/2087514#2087514">Sort a file with huge volume of data given memory constraint</a> I suggest you consider this distribution sort. It could be huge gain in your context.</p> <p>The improvement over my proposal is that you don't need to open all the temp files at once, you only open one of them. It saves your day! :-)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload