StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POCUDA - Multiprocessors, Warp size and Maximum Threads Per Block: What is the exact relationship?
primarykey
Id
11564608
data
AcceptedAnswerId
11565637
AnswerCount
1
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2012-07-19T15:52:04.060
FavoriteCount
12
LastActivityDate
2012-07-19T17:09:08.987
LastEditDate
2012-07-19T16:10:47.323
LastEditorUserId
442006
OwnerUserId
1538049
ParentId
0
PostTypeId
1
Score
20
ViewCount
13604
LastEditorDisplayName
text
Body
I know that there are multiprocessors on a CUDA GPU which contain CUDA cores in them. In my workplace I am working with a GTX 590, which contains 512 CUDA cores, 16 multiprocessors and which has a warp size of 32. So this means there are 32 CUDA cores in each multiprocessor which works exactly on the same code in the same warp. And finally the maximum threads per block size is 1024. My question is how the block size and the multiprocessor count - warp size are exactly related. Let me tell my understanding of the situation: For example I allocate N blocks with the maximum threadPerBlock size of 1024 on the GTX 590. As far as I understand from the CUDA programming guide and from other sources, the blocks are firstly enumerated by the hardware. In this case 16 from the N blocks are assigned to different multiprocessors. Each block contains 1024 threads and the hardware scheduler assigns 32 of these threads to the 32 cores in a single multiprocessor. The threads in the same multiprocessor (warp) process the same line of the code and use shared memory of the current multiproccessor. If the current 32 threads encounter an off-chip operation like memory read-writes, they are replaced with an another group of 32 threads from the current block. So, there are actually 32 threads in a single block which are exactly running in parallel on a multiprocessor in any given time, not the whole of the 1024. Finally, if a block is completely processed by a multiprocessor, a new thread block from the list of the N thread blocks is plugged into the current multiprocessor. And finally there are a total of 512 threads running in parallel in the GPU during the execution of the CUDA kernel. (I know that if a block uses more registers than available on a single multiprocessor then it is divided to work on two multiprocessors but lets assume that each block can fit into a single multiprocessor in our case.) So, is my model of the CUDA parallel execution is correct? If not, what is wrong or missing? I want to fine tune the current project I am working on, so I need the most correct working model of the whole thing.
Tags
<caching><memory><cuda><textures>
Title
CUDA - Multiprocessors, Warp size and Maximum Threads Per Block: What is the exact relationship?
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USRoger Dahl
UserOwnerUserId
1. USUfuk Can Bicici
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
2. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POCUDA - Multiprocessors, Warp size and Maximum Threads Per Block: What is the exact relationship?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POCUDA - Multiprocessors, Warp size and Maximum Threads Per Block: What is the exact relationship?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POCUDA - Multiprocessors, Warp size and Maximum Threads Per Block: What is the exact relationship?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.