Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>As has been guessed in the comments, I can confirm that the processing of the array is being done in chunks. First, I will show you where things are in the code and then I will show you how you can change the chunk size and the effect that doing so has on your benchmark.</p> <h2>Where to find the reduction processing in the Numpy source files</h2> <p>np.all(x) is the same as x.all(). all() really calls np.core.umath.logical_and.reduce(x). </p> <p>If you want to dig into the numpy source, I will try to guide you through finding that a buffer/chunk size is used. The folder with all of the code we will be looking at is numpy/core/src/umath/. </p> <p>PyUFunc_Reduce() in ufunc_object.c is the C function that handles the reduce. In PyUFunc_Reduce(), the chunk, or buffer, size is found by looking up the value for reduce in some global dictionary via the PyUFunc_GetPyValues() function (ufunc_object.c). On my machine and compiling from the development branch, the chunk size is 8192. PyUFunc_ReduceWrapper() in reduction.c is called to set-up the iterator (with a stride equal to the chunk size) and it calls the passed in loop function which is reduce_loop() in ufunc_object.c.</p> <p>reduce_loop() basically just uses the iterator and calls another innerloop() function for each chunk. The innerloop function is found in loops.c.src. For a boolean array and our case of all/logical_and, the appropriate innerloop function is BOOL_logical_and. You can find the right function by searching for BOOLEAN LOOPS and then it is the second function below that (it is hard to find due to the template-like programming used here). There you will find that short circuiting is in fact being done for each chunk.</p> <h2>How to change the buffer size used in ufunctions (and thus in any/all)</h2> <p>You can get the chunk/buffer size with np.getbuffersize(). For me, that returns 8192 without manually setting it which matches what I found by printing out the buffer size in the code. You can use np.setbuffersize() to change the chunk size.</p> <h2>Results using a bigger buffer size</h2> <p>I changed your benchmark code to the following:</p> <pre><code>import timeit import numpy as np print 'Numpy v%s' %np.version.full_version stmt = "np.all(x)" for ii in xrange(9): setup = "import numpy as np; x = np.zeros(%d,dtype=np.bool); np.setbufsize(%d)" %(10**ii, max(8192, min(10**ii, 10**7))) timer = timeit.Timer(stmt,setup) n,r = 1,3 t = np.min(timer.repeat(r,n)) while t &lt; 0.2: n *= 10 t = np.min(timer.repeat(r,n)) t /= n if t &lt; 1E-3: timestr = "%1.3f us" %(t*1E6) elif t &lt; 1: timestr = "%1.3f ms" %(t*1E3) else: timestr = "%1.3f s" %t print "Array size: 1E%i, %i loops, best of %i: %s/loop" %(ii,n,r,timestr) </code></pre> <p>Numpy doesn't like the buffer size being too small or too big so I made sure that it didn't get smaller than 8192 or larger than 1E7 because Numpy didn't like a buffer size of 1E8. Otherwise, I was setting the buffer size to the size of the array being processed. I only went up to 1E8 because my machine only has 4GB of memory at the moment. Here are the results:</p> <pre><code>Numpy v1.8.0.dev-2a5c2c8 Array size: 1E0, 100000 loops, best of 3: 5.351 us/loop Array size: 1E1, 100000 loops, best of 3: 5.390 us/loop Array size: 1E2, 100000 loops, best of 3: 5.366 us/loop Array size: 1E3, 100000 loops, best of 3: 5.360 us/loop Array size: 1E4, 100000 loops, best of 3: 5.433 us/loop Array size: 1E5, 100000 loops, best of 3: 5.400 us/loop Array size: 1E6, 100000 loops, best of 3: 5.397 us/loop Array size: 1E7, 100000 loops, best of 3: 5.381 us/loop Array size: 1E8, 100000 loops, best of 3: 6.126 us/loop </code></pre> <p>There is a small uptick in the last timing because there are multiple chunks being processed due to the limitations on how big the buffer size can be.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload