Note that there are some explanatory texts on larger screens.

plurals
  1. POCan I bundle two MPI messages?
    text
    copied!<p>I am trying to do an all-to-one communication out-of-order. Basically I have multiple floating point arrays of the same size, identified by an integer id.</p> <p>Each message should look like:</p> <pre><code>&lt;int id&gt;&lt;float array data&gt; </code></pre> <p>On the receiver side, it knows exactly how many arrays are there, and thus sets up exact number of recvs. Upon receiving a message, it parses the id and put data into the right place. The problem is that a message could be sent from any other processes to the receiving process. (e.g. the producers have a work queue structure, and process whichever id is available on the queue.) </p> <p>Since MPI only guarantees P2P in order delivery, I can't trivially put integer id and FP data in two messages, otherwise receiver might not be able to match id with data. MPI doesn't allow two types of data in one send as well. </p> <p>I can only think of two approaches. </p> <p>1) Receiver has an array of size m (source[m]), m being number of sending nodes. Sender sends id first, then the data. Receiver saves id to source[i] after receiving an integer message from sender i. Upon receiving a FP array from sender i, it checks source[i], get the id, and moves data to the right place. It works because MPI guarantees in-order P2P communication. It requires receiver to keep state information for each sender. To make matter worse, if a single sending process can have two ids sent before data (e.g. multi-threaded), this mechanism won't work. </p> <p>2) Treat id and FP as bytes, and copy them into a send buffer. Send them as MPI_CHAR, and receiver casts them back to an integer and a FP array. Then I need to pay the addition cost of copying things into a byte buffer on sender side. The total temporary buffer also grows as I grow number of threads within an MPI process. </p> <p>Neither of them are perfect solutions. I don't want to lock anything inside a process. I wonder if any of you have better suggestions.</p> <p>Edit: The code will be run on a shared cluster with infiniband. The machines will be randomly assigned. So I don't think TCP sockets will be able to help me here. In addition, IPoIB looks expensive. I do need the full 40Gbps speed for communication, and keep CPU doing the computation. </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload