Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>In order to get copy/compute overlap, you must use pinned memory. The reason for this is contained in the paragraph you excerpted. Presumably the whole reason for your multi-streamed approach is for copy/compute overlap, so I don't think the correct answer is to switch to using pageable memory buffers.</p> <p>Regarding your question, assuming <code>h_mem</code> is only used as the source buffer for the pseudo-code you've shown here (i.e. the data in it only participates in that one <code>cudaMemcpyAsync</code> call), then the h_mem buffer is no longer needed once the <em>next</em> cuda operation in that stream begins. So if your <code>kernel_launch</code> were an actual <code>kernel&lt;&lt;&lt;...&gt;&gt;&gt;(...)</code>, then once <code>kernel</code> begins, you can be assured that the previous <code>cudaMemcpyAsync</code> is complete.</p> <p>You could use cudaEvents with <code>cudaEventSynchronize()</code> or <code>cudaStreamWaitEvent()</code>, or you could use <code>cudaStreamSynchronize()</code> directly in the stream. For example, if you have a <code>cudaStreamSynchronize()</code> call somewhere in the stream pseudocode you have shown, and it is after the <code>cudaMemcpyAsync</code> call, then any code after the <code>cudaStreamSynchronize()</code> call is guaranteed to be executing after the <code>cudaMemcpyAsync()</code> call is complete. All of the calls I've referenced are documented in the <a href="http://docs.nvidia.com/cuda/cuda-runtime-api/index.html" rel="nofollow">usual place</a>.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload