Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Using <code>numpy.concatenate</code> apparently load the arrays into memory. To avoid this you can easily create a thrid <code>memmap</code> array in a new file and read the values from the arrays you wish to concatenate. In a more efficient way, you can also append new arrays to an already existing file on disk.</p> <p>For any case you must choose the right order for the array (row-major or column-major).</p> <p>The following examples illustrate how to concatenate along axis 0 and axis 1.</p> <hr> <p>1) concatenate along <code>axis=0</code></p> <pre><code>a = np.memmap('a.array', dtype='float64', mode='w+', shape=( 5000,1000)) # 38.1MB a[:,:] = 111 b = np.memmap('b.array', dtype='float64', mode='w+', shape=(15000,1000)) # 114 MB b[:,:] = 222 </code></pre> <p>You can define a third array reading the same file as the first array to be concatenated (here <code>a</code>) in mode <code>r+</code> (read and append), but with the shape of the final array you want to achieve after concatenation, like:</p> <pre><code>c = np.memmap('a.array', dtype='float64', mode='r+', shape=(20000,1000), order='C') c[5000:,:] = b </code></pre> <p>Concatenating along <code>axis=0</code> does not require to pass <code>order='C'</code> because this is already the default order.</p> <hr> <p>2) concatenate along <code>axis=1</code></p> <pre><code>a = np.memmap('a.array', dtype='float64', mode='w+', shape=(5000,3000)) # 114 MB a[:,:] = 111 b = np.memmap('b.array', dtype='float64', mode='w+', shape=(5000,1000)) # 38.1MB b[:,:] = 222 </code></pre> <p>The arrays saved on disk are actually flattened, so if you create <code>c</code> with <code>mode=r+</code> and <code>shape=(5000,4000)</code> without changing the array order, the <code>1000</code> first elements from the second line in <code>a</code> will go to the first in line in <code>c</code>. But you can easily avoid this passing <code>order='F'</code> (column-major) to <code>memmap</code>:</p> <pre><code>c = np.memmap('a.array', dtype='float64', mode='r+',shape=(5000,4000), order='F') c[:, 3000:] = b </code></pre> <hr> <p>Here you have an updated file 'a.array' with the concatenation result. You may repeat this process to concatenate in pairs of two.</p> <p>Related questions:</p> <ul> <li><a href="https://stackoverflow.com/q/16149803/832621">Working with big data in python and numpy, not enough ram, how to save partial results on disc?</a></li> </ul>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload