StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POpython for-loop slower each iteration
text
Body
copied!<p>I am trying to optimize some python code (to speed up some matrix operations), my code is something similar to this one (my real dataset is also similar to 'gps'),</p> <pre><code>import numpy as np gps = [np.random.rand(50,50) for i in xrange(1000)] ips = np.zeros( (len(gps),len(gps)), dtype='float32') for i in xrange(len(gps)): for j in xrange(0,i+1): ips[i,j]= f.innerProd(gps[i],gps[j]) ips[j,i]= ips[i,j] print "Inner product matrix: %3.0f %% done (%d of %d)"% \ (((i+1)**2.)/(len(gps)**2.)*100, i, len(gps)) def innerProd(mat1,mat2): return float(np.sum(np.dot(np.dot(mat1,mat2),mat1))) </code></pre> <p>What I would like to understand is , why is it that the program begins running fast during the first iterations and then slows down as it iterates further? I know the question might be a bit naive but I really want to have a clearer idea of what is happening before I attempt anything else. I already implemented my function in Fortran (leaving within the Fortran realm any for loops) and used f2py to create a dynamic lib to call the function from python, this would be the new code in python..</p> <pre><code>import numpy as np import myfortranInnProd as fip gps = [np.random.rand(50,50) for i in xrange(1000)] ips = np.zeros( (len(gps),len(gps)), dtype='float32') ips = fip.innerProd(gps) </code></pre> <p>unfortunately I only found out (surprisingly) that my fortran-python version runs 1.5 ~ 2 times slower than the first version (it is important to mention that I used MATMUL() on the Fortran implementation). I have been googling around for a while and I believe that this "slow down" has something to do with the memory bandwidth, memory allocation or caching, given the large datasets, but I am not very sure about what is really happening behind and how could I improve the performance. I have run the code on both a small intel atom , 2GB ram and a 4 core intel xeon, with 8GB (of course with a correspondingly scaled dataset) and the "slow down" behavior is the same. </p> <p>I just need to understand why is it that this 'slow down' happens? would it do any good if i implement the function in C ? or try to implement it to run on a GPU ? Any other ideas how to improve it? Thanks in advance</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload