StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POmultiprocessing: sharing a large read-only object between processes?
text
Body
copied!<p>Do child processes spawned via <a href="http://docs.python.org/library/multiprocessing.html" rel="noreferrer">multiprocessing</a> share objects created earlier in the program?</p> <p>I have the following setup:</p> <pre><code>do_some_processing(filename): for line in file(filename): if line.split(',')[0] in big_lookup_object: # something here if __name__ == '__main__': big_lookup_object = marshal.load('file.bin') pool = Pool(processes=4) print pool.map(do_some_processing, glob.glob('*.data')) </code></pre> <p>I'm loading some big object into memory, then creating a pool of workers that need to make use of that big object. The big object is accessed read-only, I don't need to pass modifications of it between processes.</p> <p>My question is: is the big object loaded into shared memory, as it would be if I spawned a process in unix/c, or does each process load its own copy of the big object? </p> <p>Update: to clarify further - big_lookup_object is a shared lookup object. I don't need to split that up and process it separately. I need to keep a single copy of it. The work that I need to split it is reading lots of other large files and looking up the items in those large files against the lookup object.</p> <p>Further update: database is a fine solution, memcached might be a better solution, and file on disk (shelve or dbm) might be even better. In this question I was particularly interested in an in memory solution. For the final solution I'll be using hadoop, but I wanted to see if I can have a local in-memory version as well.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload