Note that there are some explanatory texts on larger screens.

plurals
  1. POIs the multiprocessing module of python the right way to speed up large numeric calculations?
    primarykey
    data
    text
    <p>I have a strong background in numeric compuation using FORTRAN and parallelization with OpenMP, which I found easy enough to use it on many problems. I switched to PYTHON since it much more fun (at least for me) to develop with, but parallelization for nummeric tasks seem much more tedious than with OpenMP. I'm often interested in loading large (tens of GB) data sets to to the main Memory and manipulate it in parallel while containing only a single copy of the data in main memory (shared data). I started to use the PYTHON module MULTIPROCESSING for this and came up with this generic example:</p> <pre><code>#test cases #python parallel_python_example.py 1000 1000 #python parallel_python_example.py 10000 50 import sys import numpy as np import time import multiprocessing import operator n_dim = int(sys.argv[1]) n_vec = int(sys.argv[2]) #class which contains large dataset and computationally heavy routine class compute: def __init__(self,n_dim,n_vec): self.large_matrix=np.random.rand(n_dim,n_dim)#define large random matrix self.many_vectors=np.random.rand(n_vec,n_dim)#define many random vectors which are organized in a matrix def dot(self,a,b):#dont use numpy to run on single core only!! return sum(p*q for p,q in zip(a,b)) def __call__(self,ii):# use __call__ as computation such that it can be handled by multiprocessing (pickle) vector = self.dot(self.large_matrix,self.many_vectors[ii,:])#compute product of one of the vectors and the matrix return self.dot(vector,vector)# return "length" of the result vector #initialize data comp = compute(n_dim,n_vec) #single core tt=time.time() result = [comp(ii) for ii in range(n_vec)] time_single = time.time()-tt print "Time:",time_single #multi core for prc in [1,2,4,10]:#the 20 case is there to check that the large_matrix is only once in the main memory tt=time.time() pool = multiprocessing.Pool(processes=prc) result = pool.map(comp,range(n_vec)) pool.terminate() time_multi = time.time()-tt print "Time using %2i processes. Time: %10.5f, Speedup:%10.5f" % (prc,time_multi,time_single/time_multi) </code></pre> <p>I ran two test cases on my machine (64bit Linux using Fedora 18) with the following results:</p> <pre><code>andre@lot:python&gt;python parallel_python_example.py 10000 50 Time: 10.3667809963 Time using 1 processes. Time: 15.75869, Speedup: 0.65785 Time using 2 processes. Time: 11.62338, Speedup: 0.89189 Time using 4 processes. Time: 15.13109, Speedup: 0.68513 Time using 10 processes. Time: 31.31193, Speedup: 0.33108 andre@lot:python&gt;python parallel_python_example.py 1000 1000 Time: 4.9363951683 Time using 1 processes. Time: 5.14456, Speedup: 0.95954 Time using 2 processes. Time: 2.81755, Speedup: 1.75201 Time using 4 processes. Time: 1.64475, Speedup: 3.00131 Time using 10 processes. Time: 1.60147, Speedup: 3.08242 </code></pre> <p>My question is, am I misusing the MULTIPROCESSING module here? Or is this the way it goes with PYTHON (i.e. don't parallelize within python but rely totally on numpy's optimizations)? </p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload