StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
5659038
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2011-04-14T05:43:43.243
FavoriteCount
0
LastActivityDate
2011-04-14T08:50:52.260
LastEditDate
2011-04-14T08:50:52.260
LastEditorUserId
681865
OwnerUserId
681865
ParentId
5657388
PostTypeId
2
Score
4
ViewCount
0
LastEditorDisplayName
text
Body
The operation you need to use to perform global summation across all the threads is known as a "parallel reduction". While you could use atomic operations to do this, I would not recommend it. There is a reduction kernel and a very good paper discussing the technique in the CUDA SDK, it is worth reading. If I were writing code to do what you want, it would probably look like this: <pre><code>template <int blocksize> __global__ void calcRatio(float *orig, float *modified, int size, float *result, int *count, const float error) { __shared__ volatile float buff[blocksize]; int index = threadIdx.x + blockIdx.x * blockDim.x; int stride = blockDim.x * gridDim.x; int count = 0; for(int i=index; i<n; i+=stride) { val = orig[index] - modified[index]; count += (val < error); result[index] = val; } buff[threadIdx.x] = count; __syncthreads(); // Parallel reduction in shared memory using 1 warp if (threadId.x < warpSize) { for(int i=threadIdx.x + warpSize; i<blocksize; i+= warpSize) { buff[threadIdx.x] += buff[i]; if (threadIdx.x < 16) buff[threadIdx.x] +=buff[threadIdx.x + 16]; if (threadIdx.x < 8) buff[threadIdx.x] +=buff[threadIdx.x + 8]; if (threadIdx.x < 4) buff[threadIdx.x] +=buff[threadIdx.x + 4]; if (threadIdx.x < 2) buff[threadIdx.x] +=buff[threadIdx.x + 2]; if (threadIdx.x == 0) count[blockIdx.x] = buff[0] + buff[1]; } } </code></pre> The first stanza does what your serial code does - computes a difference and a thread local total of elements which are less than error. Note I have written this version so that each thread is designed to process more than one entry of the input data. This has been done to help offset the computational cost of the parallel reduction that follows, and the idea is that you would use fewer blocks and threads than there were input data set entries. The second stanza is the reduction itself, done in shared memory. It is effectively a "tree like" operation where the size of the set of thread local subtotals within a single block of threads is first summed down to 32 subtotals, then the subtotals are combined until there is the final subtotal for the block, and that is then stored is the total for the block. You will wind up with a small list of sub totals in count, one for each block you launched, which can be copied back to the host and the final result you need calculated there. Please note I coded this in the browser and haven't compiled it, there might be errors, but it should give an idea about how an "advanced" version of what you are trying to do would work. 
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POCUDA combining thread independent(??) variables during execution
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. UStalonmies
UserOwnerUserId
1. UStalonmies
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POCUDA combining thread independent(??) variables during execution
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.