Note that there are some explanatory texts on larger screens.

plurals
  1. POinfiniband rdma poor transfer bw
    primarykey
    data
    text
    <p>In my application I use an infiniband infrastructure to send a stream of data from a server to another one. I have used to easy the development ip over infiniband because I'm more familiar with socket programming. Until now the performance (max bw) was good enough for me (I knew I wasn't getting the maximum bandwith achievable), now I need to get out from that infiniband connection more bandwidth.</p> <p>ib_write_bw claims that my max achievable bandwidth is around 1500 MB/s (I'm not getting 3000MB/s because my card is installed in a PCI 2.0 8x).</p> <p>So far so good. I coded my communication channel using ibverbs and rdma but I'm getting far less than the bandwith I can get, I'm even getting a bit less bandwidth than using socket but at least my application doesn't use any CPU power:</p> <p>ib_write_bw: 1500 MB/s</p> <p>sockets: 700 MB/s &lt;= One core of my system is at 100% during this test</p> <p>ibvers+rdma: 600 MB/s &lt;= No CPU is used at all during this test</p> <p>It seems that the bottleneck is here:</p> <pre><code>ibv_sge sge; sge.addr = (uintptr_t)memory_to_transfer; sge.length = memory_to_transfer_size; sge.lkey = memory_to_transfer_mr-&gt;lkey; ibv_send_wr wr; memset(&amp;wr, 0, sizeof(wr)); wr.wr_id = 0; wr.opcode = IBV_WR_RDMA_WRITE; wr.sg_list = &amp;sge; wr.num_sge = 1; wr.send_flags = IBV_SEND_SIGNALED; wr.wr.rdma.remote_addr = (uintptr_t)thePeerMemoryRegion.addr; wr.wr.rdma.rkey = thePeerMemoryRegion.rkey; ibv_send_wr *bad_wr = NULL; if (ibv_post_send(theCommunicationIdentifier-&gt;qp, &amp;wr, &amp;bad_wr) != 0) { notifyError("Unable to ibv post receive"); } </code></pre> <p>at this point the next code waiting for completation that is:</p> <pre><code>//Wait for completation ibv_cq *cq; void* cq_context; if (ibv_get_cq_event(theCompletionEventChannel, &amp;cq, &amp;cq_context) != 0) { notifyError("Unable to get a ibv cq event"); } ibv_ack_cq_events(cq, 1); if (ibv_req_notify_cq(cq, 0) != 0) { notifyError("Unable to get a req notify"); } ibv_wc wc; int myRet = ibv_poll_cq(cq, 1, &amp;wc); if (myRet &gt; 1) { LOG(WARNING) &lt;&lt; "Got more than a single ibv_wc, expecting one"; } </code></pre> <p>The time from my ibv_post_send and when ibv_get_cq_event returns an event is 13.3ms when transfering chuncks of 8 MB achieving then around 600 MB/s.</p> <p>To specify more (in pseudocode what I do globally):</p> <p>Active Side:</p> <pre><code>post a message receive rdma connection wait for rdma connection event &lt;&lt;at this point transfer tx flow starts&gt;&gt; start: register memory containing bytes to transfer wait remote memory region addr/key ( I wait for a ibv_wc) send data with ibv_post_send post a message receive wait for ibv_post_send event ( I wait for a ibv_wc) (this lasts 13.3 ms) send message "DONE" unregister memory goto start </code></pre> <p>Passive Side:</p> <pre><code>post a message receive rdma accept wait for rdma connection event &lt;&lt;at this point transfer rx flow starts&gt;&gt; start: register memory that has to receive the bytes send addr/key of memory registered wait "DONE" message unregister memory post a message receive goto start </code></pre> <p>Does anyone knows what I'm doing wrong? Or what I can improve? I'm not affected by "Not Invented Here" syndrome so I'm even open to throw away what I have done until now and adopting something else. I only need a point to point contiguous transfer.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload