Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>When you invoke <code>multiprocessing.Pool</code>, the <code>multiprocessing</code> module creates several new processes (using <code>os.fork</code> or similar).</p> <p>By default, during a <code>fork</code>, new processes inherit all open file descriptors.</p> <p>When you invoke <code>subprocess.Popen</code> with a <code>subprocess.PIPE</code> argument, the <code>subprocess</code> module creates some new pipe file descriptors to send data to/from the new process. In this particular case, the pipe is used to send data from the parent process (python) to the child (gzip), and gzip will exit—and thus make the <code>proc.wait()</code> finish—when <em>all</em> write access to the pipe goes away. (This is what generates "EOF on a pipe": no more write-able file descriptors exist to that pipe.)</p> <p>Thus, in this case, if you (all in the "original" python process) do this in this sequence:</p> <ol> <li>create a pipe</li> <li>create some <code>multiprocessing.Pool</code> processes</li> <li>send data to gzip</li> <li>close the pipe to gzip</li> </ol> <p>then, due to the behavior of <code>fork</code>, each of the Pool processes has an <code>os.dup</code> of the write-to-gzip pipe, so gzip continues waiting for more data, which those Pool processes can (but never do) send. The gzip process will exit as soon as the Pool processes close their pipe descriptors.</p> <p>Fixing this in real (more complicated) code can be nontrivial. Ideally, what you would like is for <code>multiprocessing.Pool</code> to know (magically, somehow) which file descriptors should be retained, and which should not, but this is not as simple as "just close a bunch of descriptors in the created child processes":</p> <pre><code>output = open('somefile', 'a') def somefunc(arg): ... do some computation, etc ... output.write(result) pool = multiprocessing.Pool() pool.map(somefunc, iterable) </code></pre> <p>Clearly <code>output.fileno()</code> must be shared by the worker processes here.</p> <p>You could try to use the <code>Pool</code>'s <code>initializer</code> to invoke <code>proc.stdin.close</code> (or <code>os.close</code> on a list of fd's), but then you need to arrange to keep track of descriptors-to-close. It's probably simplest to restructure your code to avoid creating a pool "at the wrong time".</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload