Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy is loading this file taking so much memory?
    primarykey
    data
    text
    <p>Trying to load a file into python. It's a very big file (1.5Gb), but I have the available memory and I just want to do this once (hence the use of python, I just need to sort the file one time so python was an easy choice). </p> <p>My issue is that loading this file is resulting in <strong>way</strong> to much memory usage. When I've loaded about 10% of the lines into memory, Python is already using 700Mb, which is clearly too much. At around 50% the script hangs, using 3.03 Gb of real memory (and slowly rising). </p> <p>I know this isn't the most efficient method of sorting a file (memory-wise) but I just want it to work so I can move on to more important problems :D So, what is wrong with the following python code that's causing the massive memory usage: </p> <pre><code>print 'Loading file into memory' input_file = open(input_file_name, 'r') input_file.readline() # Toss out the header lines = [] totalLines = 31164015.0 currentLine = 0.0 printEvery100000 = 0 for line in input_file: currentLine += 1.0 lined = line.split('\t') printEvery100000 += 1 if printEvery100000 == 100000: print str(currentLine / totalLines) printEvery100000 = 0; lines.append( (lined[timestamp_pos].strip(), lined[personID_pos].strip(), lined[x_pos].strip(), lined[y_pos].strip()) ) input_file.close() print 'Done loading file into memory' </code></pre> <p>EDIT: In case anyone is unsure, the general consensus seems to be that each variable allocated eats up more and more memory. I "fixed" it in this case by 1) calling readLines(), which still loads all the data, but only has one 'string' variable overhead for each line. This loads the entire file using about 1.7Gb. Then, when I call lines.sort(), I pass a function to key that splits on tabs and returns the right column value, converted to an int. This is slow computationally, and memory-intensive overall, but it works. Learned a ton about variable allocation overhad today :D</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload