Note that there are some explanatory texts on larger screens.

plurals
  1. PONumpy vs Cython speed
    primarykey
    data
    text
    <p>I have an analysis code that does some heavy numerical operations using numpy. Just for curiosity, tried to compile it with cython with little changes and then I rewrote it using loops for the numpy part.</p> <p>To my surprise, the code based on loops was much faster (8x). I cannot post the complete code, but I put together a very simple unrelated computation that shows similar behavior (albeit the timing difference is not so big):</p> <p>Version 1 (without cython)</p> <pre><code>import numpy as np def _process(array): rows = array.shape[0] cols = array.shape[1] out = np.zeros((rows, cols)) for row in range(0, rows): out[row, :] = np.sum(array - array[row, :], axis=0) return out def main(): data = np.load('data.npy') out = _process(data) np.save('vianumpy.npy', out) </code></pre> <p>Version 2 (building a module with cython)</p> <pre><code>import cython cimport cython import numpy as np cimport numpy as np DTYPE = np.float64 ctypedef np.float64_t DTYPE_t @cython.boundscheck(False) @cython.wraparound(False) @cython.nonecheck(False) cdef _process(np.ndarray[DTYPE_t, ndim=2] array): cdef unsigned int rows = array.shape[0] cdef unsigned int cols = array.shape[1] cdef unsigned int row cdef np.ndarray[DTYPE_t, ndim=2] out = np.zeros((rows, cols)) for row in range(0, rows): out[row, :] = np.sum(array - array[row, :], axis=0) return out def main(): cdef np.ndarray[DTYPE_t, ndim=2] data cdef np.ndarray[DTYPE_t, ndim=2] out data = np.load('data.npy') out = _process(data) np.save('viacynpy.npy', out) </code></pre> <p>Version 3 (building a module with cython)</p> <pre><code>import cython cimport cython import numpy as np cimport numpy as np DTYPE = np.float64 ctypedef np.float64_t DTYPE_t @cython.boundscheck(False) @cython.wraparound(False) @cython.nonecheck(False) cdef _process(np.ndarray[DTYPE_t, ndim=2] array): cdef unsigned int rows = array.shape[0] cdef unsigned int cols = array.shape[1] cdef unsigned int row cdef np.ndarray[DTYPE_t, ndim=2] out = np.zeros((rows, cols)) for row in range(0, rows): for col in range(0, cols): for row2 in range(0, rows): out[row, col] += array[row2, col] - array[row, col] return out def main(): cdef np.ndarray[DTYPE_t, ndim=2] data cdef np.ndarray[DTYPE_t, ndim=2] out data = np.load('data.npy') out = _process(data) np.save('vialoop.npy', out) </code></pre> <p>With a 10000x10 matrix saved in data.npy, the times are:</p> <pre><code>$ python -m timeit -c "from version1 import main;main()" 10 loops, best of 3: 4.56 sec per loop $ python -m timeit -c "from version2 import main;main()" 10 loops, best of 3: 4.57 sec per loop $ python -m timeit -c "from version3 import main;main()" 10 loops, best of 3: 2.96 sec per loop </code></pre> <p>Is this expected or is there an optimization that I am missing? The fact that version 1 and 2 gives the same result is somehow expected, but why version 3 is faster?</p> <p>Ps.- This is NOT the calculation that I need to make, just a simple example that shows the same thing.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload