Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Well this is kind of interesting note if you run the following commands:</p> <pre><code>import numpy from multiprocessing import Pool a = numpy.arange(1000000) pool = Pool(processes = 5) result = pool.map(numpy.sin, a) UnpicklingError: NEWOBJ class argument has NULL tp_new </code></pre> <p>wasn't expecting that, so whats going on, well:</p> <pre><code>&gt;&gt;&gt; help(numpy.sin) Help on ufunc object: sin = class ufunc(__builtin__.object) | Functions that operate element by element on whole arrays. | | To see the documentation for a specific ufunc, use np.info(). For | example, np.info(np.sin). Because ufuncs are written in C | (for speed) and linked into Python with NumPy's ufunc facility, | Python's help() function finds this page whenever help() is called | on a ufunc. </code></pre> <p>yep numpy.sin is implemented in c as such you can't really use it directly with multiprocessing.</p> <p>so we have to wrap it with another function</p> <p>perf:</p> <pre><code>import time import numpy from multiprocessing import Pool def numpy_sin(value): return numpy.sin(value) a = numpy.arange(1000000) pool = Pool(processes = 5) start = time.time() result = numpy.sin(a) end = time.time() print 'Singled threaded %f' % (end - start) start = time.time() result = pool.map(numpy_sin, a) pool.close() pool.join() end = time.time() print 'Multithreaded %f' % (end - start) $ python perf.py Singled threaded 0.032201 Multithreaded 10.550432 </code></pre> <p>wow, wasn't expecting that either, well theres a couple of issues for starters we are using a python function even if its just a wrapper vs a pure c function, and theres also the overhead of copying the values, multiprocessing by default doesn't share data, as such each value needs to be copy back/forth.</p> <p>do note that if properly segment our data:</p> <pre><code>import time import numpy from multiprocessing import Pool def numpy_sin(value): return numpy.sin(value) a = [numpy.arange(100000) for _ in xrange(10)] pool = Pool(processes = 5) start = time.time() result = numpy.sin(a) end = time.time() print 'Singled threaded %f' % (end - start) start = time.time() result = pool.map(numpy_sin, a) pool.close() pool.join() end = time.time() print 'Multithreaded %f' % (end - start) $ python perf.py Singled threaded 0.150192 Multithreaded 0.055083 </code></pre> <p>So what can we take from this, multiprocessing is great but we should always test and compare it sometimes its faster and sometimes its slower, depending how its used ...</p> <p>Granted you are not using <code>numpy.sin</code> but another function I would recommend you first verify that indeed multiprocessing will speed up the computation, maybe the overhead of copying values back/forth may affect you.</p> <p>Either way I also do <i>believe</i> that using <code>pool.map</code> is the best, safest method of multithreading code ...</p> <p>I hope this helps.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload