Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>As other folks have mentioned, for a really large file, you're better off iterating.</p> <p>However, you do commonly want the entire thing in memory for various reasons.</p> <p><code>genfromtxt</code> is much less efficient than <code>loadtxt</code> (though it handles missing data, whereas <code>loadtxt</code> is more "lean and mean", which is why the two functions co-exist).</p> <p>If your data is very regular (e.g. just simple delimited rows of all the same type), you can also improve on either by using <code>numpy.fromiter</code>.</p> <p>If you have enough ram, consider using <code>np.loadtxt('yourfile.txt', delimiter=',')</code> (You may also need to specify <code>skiprows</code> if you have a header on the file.)</p> <p>As a quick comparison, loading ~500MB text file with <code>loadtxt</code> uses ~900MB of ram at peak usage, while loading the same file with <code>genfromtxt</code> uses ~2.5GB.</p> <p><strong>Loadtxt</strong> <img src="https://i.stack.imgur.com/noUl5.png" alt="Memory and CPU usage of numpy.loadtxt while loading a ~500MB ascii file"></p> <hr> <p><strong>Genfromtxt</strong> <img src="https://i.stack.imgur.com/C23j3.png" alt="Memory and CPU usage of numpy.genfromtxt while loading a ~500MB ascii file"></p> <hr> <p>Alternately, consider something like the following. It will only work for very simple, regular data, but it's quite fast. (<code>loadtxt</code> and <code>genfromtxt</code> do a lot of guessing and error-checking. If your data is very simple and regular, you can improve on them greatly.)</p> <pre><code>import numpy as np def generate_text_file(length=1e6, ncols=20): data = np.random.random((length, ncols)) np.savetxt('large_text_file.csv', data, delimiter=',') def iter_loadtxt(filename, delimiter=',', skiprows=0, dtype=float): def iter_func(): with open(filename, 'r') as infile: for _ in range(skiprows): next(infile) for line in infile: line = line.rstrip().split(delimiter) for item in line: yield dtype(item) iter_loadtxt.rowlength = len(line) data = np.fromiter(iter_func(), dtype=dtype) data = data.reshape((-1, iter_loadtxt.rowlength)) return data #generate_text_file() data = iter_loadtxt('large_text_file.csv') </code></pre> <p><strong>Fromiter</strong></p> <p><img src="https://i.stack.imgur.com/2dSkx.png" alt="Using fromiter to load the same ~500MB data file"></p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload