StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<ul> <li><p>A <a href="http://docs.python.org/library/multiprocessing.html#multiprocessing.Pipe" rel="nofollow noreferrer"><code>Pipe()</code></a> can only have two endpoints.</p></li> <li><p>A <a href="http://docs.python.org/library/multiprocessing.html#multiprocessing.Queue" rel="nofollow noreferrer"><code>Queue()</code></a> can have multiple producers and consumers.</p></li> </ul> <p><strong>When to use them</strong></p> <p>If you need more than two points to communicate, use a <a href="http://docs.python.org/library/multiprocessing.html#multiprocessing.Queue" rel="nofollow noreferrer"><code>Queue()</code></a>.</p> <p>If you need absolute performance, a <a href="http://docs.python.org/library/multiprocessing.html#multiprocessing.Pipe" rel="nofollow noreferrer"><code>Pipe()</code></a> is much faster because <code>Queue()</code> is built on top of <code>Pipe()</code>.</p> <p><strong>Performance Benchmarking</strong></p> <p>Let's assume you want to spawn two processes and send messages between them as quickly as possible. These are the timing results of a drag race between similar tests using <code>Pipe()</code> and <code>Queue()</code>... This is on a ThinkpadT61 running Ubuntu 11.10, and Python 2.7.2.</p> <p>FYI, I threw in results for <a href="http://docs.python.org/library/multiprocessing.html#multiprocessing.JoinableQueue" rel="nofollow noreferrer"><code>JoinableQueue()</code></a> as a bonus; <code>JoinableQueue()</code> accounts for tasks when <code>queue.task_done()</code> is called (it doesn't even know about the specific task, it just counts unfinished tasks in the queue), so that <code>queue.join()</code> knows the work is finished.</p> <p>The code for each at bottom of this answer...</p> <pre><code>mpenning@mpenning-T61:~$ python multi_pipe.py Sending 10000 numbers to Pipe() took 0.0369849205017 seconds Sending 100000 numbers to Pipe() took 0.328398942947 seconds Sending 1000000 numbers to Pipe() took 3.17266988754 seconds mpenning@mpenning-T61:~$ python multi_queue.py Sending 10000 numbers to Queue() took 0.105256080627 seconds Sending 100000 numbers to Queue() took 0.980564117432 seconds Sending 1000000 numbers to Queue() took 10.1611330509 seconds mpnening@mpenning-T61:~$ python multi_joinablequeue.py Sending 10000 numbers to JoinableQueue() took 0.172781944275 seconds Sending 100000 numbers to JoinableQueue() took 1.5714070797 seconds Sending 1000000 numbers to JoinableQueue() took 15.8527247906 seconds mpenning@mpenning-T61:~$ </code></pre> <p>In summary <code>Pipe()</code> is about three times faster than a <code>Queue()</code>. Don't even think about the <code>JoinableQueue()</code> unless you really must have the benefits.</p> <p><strong>BONUS MATERIAL 2</strong></p> <p>Multiprocessing introduces subtle changes in information flow that make debugging hard unless you know some shortcuts. For instance, you might have a script that works fine when indexing through a dictionary in under many conditions, but infrequently fails with certain inputs.</p> <p>Normally we get clues to the failure when the entire python process crashes; however, you don't get unsolicited crash tracebacks printed to the console if the multiprocessing function crashes. Tracking down unknown multiprocessing crashes is hard without a clue to what crashed the process.</p> <p>The simplest way I have found to track down multiprocessing crash informaiton is to wrap the entire multiprocessing function in a <code>try</code> / <code>except</code> and use <code>traceback.print_exc()</code>:</p> <pre><code>import traceback def reader(args): try: # Insert stuff to be multiprocessed here return args[0]['that'] except: print "FATAL: reader({0}) exited while multiprocessing".format(args) traceback.print_exc() </code></pre> <p>Now, when you find a crash you see something like:</p> <pre><code>FATAL: reader([{'crash', 'this'}]) exited while multiprocessing Traceback (most recent call last): File "foo.py", line 19, in __init__ self.run(task_q, result_q) File "foo.py", line 46, in run raise ValueError ValueError </code></pre> <p><strong>Source Code:</strong></p> <hr> <pre><code>""" multi_pipe.py """ from multiprocessing import Process, Pipe import time def reader_proc(pipe): ## Read from the pipe; this will be spawned as a separate Process p_output, p_input = pipe p_input.close() # We are only reading while True: msg = p_output.recv() # Read from the output pipe and do nothing if msg=='DONE': break def writer(count, p_input): for ii in xrange(0, count): p_input.send(ii) # Write 'count' numbers into the input pipe p_input.send('DONE') if __name__=='__main__': for count in [10**4, 10**5, 10**6]: # Pipes are unidirectional with two endpoints: p_input ------> p_output p_output, p_input = Pipe() # writer() writes to p_input from _this_ process reader_p = Process(target=reader_proc, args=((p_output, p_input),)) reader_p.daemon = True reader_p.start() # Launch the reader process p_output.close() # We no longer need this part of the Pipe() _start = time.time() writer(count, p_input) # Send a lot of stuff to reader_proc() p_input.close() reader_p.join() print("Sending {0} numbers to Pipe() took {1} seconds".format(count, (time.time() - _start))) </code></pre> <hr> <pre><code>""" multi_queue.py """ from multiprocessing import Process, Queue import time import sys def reader_proc(queue): ## Read from the queue; this will be spawned as a separate Process while True: msg = queue.get() # Read from the queue and do nothing if (msg == 'DONE'): break def writer(count, queue): ## Write to the queue for ii in range(0, count): queue.put(ii) # Write 'count' numbers into the queue queue.put('DONE') if __name__=='__main__': pqueue = Queue() # writer() writes to pqueue from _this_ process for count in [10**4, 10**5, 10**6]: ### reader_proc() reads from pqueue as a separate process reader_p = Process(target=reader_proc, args=((pqueue),)) reader_p.daemon = True reader_p.start() # Launch reader_proc() as a separate python process _start = time.time() writer(count, pqueue) # Send a lot of stuff to reader() reader_p.join() # Wait for the reader to finish print("Sending {0} numbers to Queue() took {1} seconds".format(count, (time.time() - _start))) </code></pre> <hr> <pre><code>""" multi_joinablequeue.py """ from multiprocessing import Process, JoinableQueue import time def reader_proc(queue): ## Read from the queue; this will be spawned as a separate Process while True: msg = queue.get() # Read from the queue and do nothing queue.task_done() def writer(count, queue): for ii in xrange(0, count): queue.put(ii) # Write 'count' numbers into the queue if __name__=='__main__': for count in [10**4, 10**5, 10**6]: jqueue = JoinableQueue() # writer() writes to jqueue from _this_ process # reader_proc() reads from jqueue as a different process... reader_p = Process(target=reader_proc, args=((jqueue),)) reader_p.daemon = True reader_p.start() # Launch the reader process _start = time.time() writer(count, jqueue) # Send a lot of stuff to reader_proc() (in different process) jqueue.join() # Wait for the reader to finish print("Sending {0} numbers to JoinableQueue() took {1} seconds".format(count, (time.time() - _start))) </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload