Note that there are some explanatory texts on larger screens.

plurals
  1. POMPI Error: Out of Memory - What are some solution options
    text
    copied!<p>I am trying to resolve <a href="https://stackoverflow.com/questions/6221058/fatal-error-in-mpi-irecv-aborting-job">Fatal Error in MPI_Irecv: Aborting Job</a> and received mixed (useful, however incomplete) responses to that query. </p> <p>The <strong>error message</strong> is the following:</p> <pre><code>aborting job: &gt; Fatal error in MPI_Irecv: Other MPI &gt; error, error stack: MPI_Irecv(143): &gt; MPI_Irecv(buf=0x8294a60, count=48, &gt; MPI_DOUBLE, src=2, tag=-1, &gt; MPI_COMM_WORLD, request=0xffffd6ac) &gt; failed MPID_Irecv(64): Out of &gt; memory </code></pre> <p><strong><em>I am seeking help from someone to answer to these questions (I require guidance to help debug and resolve this deadlock)</em></strong></p> <ol> <li><p>At the end of "MPI Non Blocking Send and Receive", is the memory freed by itself after the send/receive has completed OR does it have to be forced to be freed? </p></li> <li><p>Will the issue of "Out of memory" be resolved if I use "Multiple Cores" instead of a Single one?. We presently have 4 processors to 1 core and I submit my job using the following command: mpirun -np 4 . I tried using mpirun n -4 but it still ran 4 threads on the same core.</p></li> <li><p>How do I figure out how much "Shared memory" is required for my program?</p></li> </ol> <p>The MPI_ISend/MPI_IRecv is inside a recursive loop in my code and hence not very clear if the source of error lies there (If I use the Send/Recv. commands just once or twice, system computes just fine without "Out of Memory Issues"If so, how does one check and relieve such information?</p> <pre><code> #include &lt;mpi.h&gt; #define Rows 48 double *A = new double[Rows]; double *AA = new double[Rows]; .... .... int main (int argc, char *argv[]) { MPI_Status status[8]; MPI_Request request[8]; MPI_Init (&amp;argc, &amp;argv); MPI_Comm_size(MPI_COMM_WORLD, &amp;p); MPI_Comm_rank(MPI_COMM_WORLD, &amp;my_rank); while (time &lt; final time){ ... ... for (i=0; i&lt;Columns; i++) { for (y=0; y&lt;Rows; y++) { if ((my_rank) == 0) { MPI_Isend(A, Rows, MPI_DOUBLE, my_rank+1, 0, MPI_COMM_WORLD, &amp;request[1]); MPI_Irecv(AA, Rows, MPI_DOUBLE, my_rank+1, MPI_ANY_TAG, MPI_COMM_WORLD, &amp;request[3]); MPI_Wait(&amp;request[3], &amp;status[3]); MPI_Isend(B, Rows, MPI_DOUBLE, my_rank+2, 0, MPI_COMM_WORLD, &amp;request[5]); MPI_Irecv(BB, Rows, MPI_DOUBLE, my_rank+2, MPI_ANY_TAG, MPI_COMM_WORLD, &amp;request[7]); MPI_Wait(&amp;request[7], &amp;status[7]); } if ((my_rank) == 1) { MPI_Irecv(CC, Rows, MPI_DOUBLE, my_rank-1, MPI_ANY_TAG, MPI_COMM_WORLD, &amp;request[1]); MPI_Wait(&amp;request[1], &amp;status[1]); MPI_Isend(Cmpi, Rows, MPI_DOUBLE, my_rank-1, 0, MPI_COMM_WORLD, &amp;request[3]); MPI_Isend(D, Rows, MPI_DOUBLE, my_rank+2, 0, MPI_COMM_WORLD, &amp;request[6]); MPI_Irecv(DD, Rows, MPI_DOUBLE, my_rank+2, MPI_ANY_TAG, MPI_COMM_WORLD, &amp;request[8]); MPI_Wait(&amp;request[8], &amp;status[8]); } if ((my_rank) == 2) { MPI_Isend(E, Rows, MPI_DOUBLE, my_rank+1, 0, MPI_COMM_WORLD, &amp;request[2]); MPI_Irecv(EE, Rows, MPI_DOUBLE, my_rank+1, MPI_ANY_TAG, MPI_COMM_WORLD, &amp;request[4]); MPI_Wait(&amp;request[4], &amp;status[4]); MPI_Irecv(FF, Rows, MPI_DOUBLE, my_rank-2, MPI_ANY_TAG, MPI_COMM_WORLD, &amp;request[5]); MPI_Wait(&amp;request[5], &amp;status[5]); MPI_Isend(Fmpi, Rows, MPI_DOUBLE, my_rank-2, 0, MPI_COMM_WORLD, &amp;request[7]); } if ((my_rank) == 3) { MPI_Irecv(GG, Rows, MPI_DOUBLE, my_rank-1, MPI_ANY_TAG, MPI_COMM_WORLD, &amp;request[2]); MPI_Wait(&amp;request[2], &amp;status[2]); MPI_Isend(G, Rows, MPI_DOUBLE, my_rank-1, 0, MPI_COMM_WORLD, &amp;request[4]); MPI_Irecv(HH, Rows, MPI_DOUBLE, my_rank-2, MPI_ANY_TAG, MPI_COMM_WORLD, &amp;request[6]); MPI_Wait(&amp;request[6], &amp;status[6]); MPI_Isend(H, Rows, MPI_DOUBLE, my_rank-2, 0, MPI_COMM_WORLD, &amp;request[8]); } </code></pre> <p>Thanks!</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload