Note that there are some explanatory texts on larger screens.

plurals
  1. POSharing numpy arrays in python multiprocessing pool
    text
    copied!<p>I'm working on some code that does some fairly heavy numerical work on a large (tens to hundreds of thousands of numerical integrations) set of problems. Fortunately, these integrations are embarassingly parallel, so it's easy to use Pool.map() to split up the work across multiple cores.</p> <p>Right now, I have a program that has this basic workflow:</p> <pre><code>#!/usr/bin/env python from multiprocessing import Pool from scipy import * from my_parser import parse_numpy_array from my_project import heavy_computation #X is a global multidimensional numpy array X = parse_numpy_array("input.dat") param_1 = 0.0168 param_2 = 1.505 def do_work(arg): return heavy_computation(X, param_1, param_2, arg) if __name__=='__main__': pool = Pool() arglist = linspace(0.0,1.0,100) results = Pool.map(do_work,arglist) #save results in a .npy file for analysis save("Results", [X,results]) </code></pre> <p>Since X, param_1, and param_2 are hard-coded and initialized in exactly the same way for each process in the pool, this all works fine. Now that I have my code working, I'd like to make it so that the file name, param_1, and param_2 are input by the user at run-time, rather than being hard-coded.</p> <p>One thing that should be noted is that X, param_1, and param_2 are not modified as the work is being done. Since I don't modify them, I could do something like this at the beginning of the program:</p> <pre><code>import sys X = parse_numpy_array(sys.argv[1]) param_1 = float(sys.argv[2]) param_2 = float(sys.argv[3]) </code></pre> <p>And that would do the trick, but since most users of this code are running the code from Windows machines, I'd rather not go the route of command-line arguments.</p> <p>What I would really like to do is something like this:</p> <pre><code>X, param_1, param_2 = None, None, None def init(x,p1, p2) X = x param_1 = p1 param_2 = p2 if __name__=='__main__': filename = raw_input("Filename&gt; ") param_1 = float(raw_input("Parameter 1: ")) param_2 = float(raw_input("Parameter 2: ")) X = parse_numpy_array(filename) pool = Pool(initializer = init, initargs = (X, param_1, param_2,)) arglist = linspace(0.0,1.0,100) results = Pool.map(do_work,arglist) #save results in a .npy file for analysis save("Results", [X,results]) </code></pre> <p>But, of course, this fails and X/param_1/param_2 are all None when the pool.map call happens. I'm pretty new to multiprocessing, so I'm not sure why the call to the initializer fails. Is there a way to do what I want to do? Is there a better way to go about this altogether? I've also looked at using shared data, but from my understanding of the documentation, that only works on ctypes, which don't include numpy arrays. Any help with this would be greatly appreciated.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload