StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
13746353
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
5
CommunityOwnedDate
CreationDate
2012-12-06T15:00:54.983
FavoriteCount
0
LastActivityDate
2012-12-06T15:00:54.983
LastEditDate
LastEditorUserId
0
OwnerUserId
1695960
ParentId
13743039
PostTypeId
2
Score
9
ViewCount
0
LastEditorDisplayName
text
Body
The copying activity of cudaMemcpyAsync (as well as kernel activity) can be overlapped with any host code. Furthermore, data copy to and from the device (via cudaMemcpyAsync) can be overlapped with kernel activity. All 3 activities: host activity, data copy activity, and kernel activity, can be done asynchronously to each other, and can overlap each other. As you have seen and demonstrated, host activity and data copy or kernel activity can be overlapped with each other in a relatively straightforward fashion: kernel launches return immediately to the host, as does cudaMemcpyAsync. However, to get best overlap opportunities between data copy and kernel activity, it's necessary to use some additional concepts. For best overlap opportunities, we need: <ol> <li>Host memory buffers that are pinned, e.g. via cudaHostAlloc()</li> <li>Usage of cuda streams to separate various types of activity (data copy and kernel computation)</li> <li>Usage of cudaMemcpyAsync (instead of cudaMemcpy)</li> </ol> Naturally your work also needs to be broken up in a separable way. This normally means that if your kernel is performing a specific function, you may need multiple invocations of this kernel so that each invocation can be working on a separate piece of data. This allows us to copy data block B to the device while the first kernel invocation is working on data block A, for example. In so doing we have the opportunity to overlap the copy of data block B with the kernel processing of data block A. The main differences with cudaMemcpyAsync (as compared to cudaMemcpy) are that: <ol> <li>It can be issued in any stream (it takes a stream parameter)</li> <li>Normally, it returns control to the host immediately (just like a kernel call does) rather than waiting for the data copy to be completed.</li> </ol> Item 1 is a necessary feature so that data copy can be overlapped with kernel computation. Item 2 is a necessary feature so that data copy can be overlapped with host activity. Although the concepts of copy/compute overlap are pretty straightforward, in practice the implementation requires some work. For additional references, please refer to: <ol> <li><a href="http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#asynchronous-transfers-and-overlapping-transfers-with-computation" rel="noreferrer">Overlap copy/compute section</a> of the CUDA best practices guide.</li> <li>Sample code showing a <a href="http://docs.nvidia.com/cuda/cuda-samples/index.html#simple-multi-copy-and-compute" rel="noreferrer">basic implementation of copy/compute overlap</a>.</li> <li>Sample code showing a full <a href="http://docs.nvidia.com/cuda/cuda-samples/index.html#concurrent-kernels" rel="noreferrer">multi/concurrent kernel copy/compute overlap scenario</a>.</li> </ol> Note that some of the above discussion is predicated on having a compute capability 2.0 or greater device (e.g. concurrent kernels). Also, different devices may have one or 2 copy engines, meaning simultaneous copy to the device and copy from the device is only possible on certain devices. 
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POAbout cudaMemcpyAsync Function
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USRobert Crovella
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POAbout cudaMemcpyAsync Function
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.