Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>"Do child processes spawned via multiprocessing share objects created earlier in the program?"</p> <p>No. </p> <p>Processes have independent memory space.</p> <p><strong>Solution 1</strong></p> <p>To make best use of a large structure with lots of workers, do this.</p> <ol> <li><p>Write each worker as a "filter" -- reads intermediate results from stdin, does work, writes intermediate results on stdout.</p></li> <li><p>Connect all the workers as a pipeline:</p> <pre><code>process1 &lt;source | process2 | process3 | ... | processn &gt;result </code></pre></li> </ol> <p>Each process reads, does work and writes.</p> <p>This is remarkably efficient since all processes are running concurrently. The writes and reads pass directly through shared buffers between the processes.</p> <hr> <p><strong>Solution 2</strong></p> <p>In some cases, you have a more complex structure -- often a "fan-out" structure. In this case you have a parent with multiple children.</p> <ol> <li><p>Parent opens source data. Parent forks a number of children.</p></li> <li><p>Parent reads source, farms parts of the source out to each concurrently running child.</p></li> <li><p>When parent reaches the end, close the pipe. Child gets end of file and finishes normally.</p></li> </ol> <p>The child parts are pleasant to write because each child simply reads <code>sys.stdin</code>. </p> <p>The parent has a little bit of fancy footwork in spawning all the children and retaining the pipes properly, but it's not too bad.</p> <p>Fan-in is the opposite structure. A number of independently running processes need to interleave their inputs into a common process. The collector is not as easy to write, since it has to read from many sources. </p> <p>Reading from many named pipes is often done using the <code>select</code> module to see which pipes have pending input.</p> <hr> <p><strong>Solution 3</strong></p> <p>Shared lookup is the definition of a database. </p> <p>Solution 3A -- load a database. Let the workers process the data in the database.</p> <p>Solution 3B -- create a very simple server using <a href="http://werkzeug.pocoo.org/" rel="noreferrer">werkzeug</a> (or similar) to provide WSGI applications that respond to HTTP GET so the workers can query the server.</p> <hr> <p><strong>Solution 4</strong></p> <p>Shared filesystem object. Unix OS offers shared memory objects. These are just files that are mapped to memory so that swapping I/O is done instead of more convention buffered reads.</p> <p>You can do this from a Python context in several ways</p> <ol> <li><p>Write a startup program that (1) breaks your original gigantic object into smaller objects, and (2) starts workers, each with a smaller object. The smaller objects could be pickled Python objects to save a tiny bit of file reading time.</p></li> <li><p>Write a startup program that (1) reads your original gigantic object and writes a page-structured, byte-coded file using <code>seek</code> operations to assure that individual sections are easy to find with simple seeks. This is what a database engine does -- break the data into pages, make each page easy to locate via a <code>seek</code>.</p> <p>Spawn workers with access this this large page-structured file. Each worker can seek to the relevant parts and do their work there.</p></li> </ol>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload