Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The MPI specification for <code>MPI_COMM_FREE</code> states that <em>"... the object is actually deallocated only if there are no other active references to it."</em> You can disconnect processes by calling <code>MPI_COMM_DISCONNECT</code> on both ends of all intercommunicators that link them. The equivalent mpi4py call is probably <code>icomm.Disconnect()</code>.</p> <p>Still the error that you see probably comes from <code>orterun</code> (symlinked as <code>mpirun</code> and <code>mpiexec</code>) and not from the master rank. <code>orterun</code> is the one who launches all MPI processes (the initial ones and those spawned later) and then redirects their standard output to its own standard output so that you can see the output from each rank. When processes are started on the local host, <code>orterun</code> uses simple <code>fork()</code>/<code>exec()</code> mechanism as part of the <code>odls</code> framework to spawn new ranks and makes use of pipes for detection of successful launch and for IO forwarding. The launch detection pipes are open only for a very short period of time but the IO forwarding pipes remain open as long as the rank is running. If you have many ranks running at the same time, lots of pipes will stay open and hence the error message.</p> <p>The error message is a bit misleading since there are two cases of "too many descriptors" and Open MPI does not distinguish between them. The first case is when the hard kernel limit is reached but this is usually a huge value. The second case is when the per-process limit on the number of file descriptors is reached. The latter can be controlled with the <code>ulimit</code> command. You should check the value in your case with <code>ulimit -n</code> and eventually increase it. For example:</p> <pre><code>user@host$ ulimit -n 123456 user@host$ mpiexec -n 1 ... ./spawning_code.py arg1 arg2 ... </code></pre> <p>Here <code>123456</code> is the desired limit on the number of descriptors and it cannot exceed the hard limit that can be obtained with <code>ulimit -nH</code>. If you are running your program from a script (either for convenience or because you submit jobs to some batch queueing system), you should put the <code>ulimit -n</code> line in the script before the call to <code>mpirun</code>/<code>mpiexec</code>.</p> <p>Also in the text above the words <em>rank</em> and <em>process</em> are used to refer to the same thing.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload