Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I don't know about the analysis of the memory usage, but you might try this to get it to work without running out of memory. You'll sort into a new file which is accessed using a memory mapping (I've been led to believe this will work efficiently [in terms of memory]). Mmap has some OS specific workings, I tested this on Linux (very small scale).</p> <p>This is the basic code, to make it run with a decent time efficiency you'd probably want to do a binary search on the sorted file to find where to insert the line otherwise it will probably take a long time.</p> <p>You can find a file-seeking binary search algorithm in <a href="https://stackoverflow.com/questions/744256/reading-huge-file-in-python">this question</a>.</p> <p>Hopefully a memory efficient way of sorting a massive file by line:</p> <pre><code>import os from mmap import mmap input_file = open('unsorted.txt', 'r') output_file = open('sorted.txt', 'w+') # need to provide something in order to be able to mmap the file # so we'll just copy the first line over output_file.write(input_file.readline()) output_file.flush() mm = mmap(output_file.fileno(), os.stat(output_file.name).st_size) cur_size = mm.size() for line in input_file: mm.seek(0) tup = line.split("\t") while True: cur_loc = mm.tell() o_line = mm.readline() o_tup = o_line.split("\t") if o_line == '' or tup[0] &lt; o_tup[0]: # EOF or we found our spot mm.resize(cur_size + len(line)) mm[cur_loc+len(line):] = mm[cur_loc:cur_size] mm[cur_loc:cur_loc+len(line)] = line cur_size += len(line) break </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload