StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POCUDA atomic and non atomic memory access
primarykey
Id
13085338
data
AcceptedAnswerId
0
AnswerCount
1
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2012-10-26T10:39:51.230
FavoriteCount
0
LastActivityDate
2012-10-26T11:53:14.747
LastEditDate
LastEditorUserId
0
OwnerUserId
1108399
ParentId
0
PostTypeId
1
Score
0
ViewCount
747
LastEditorDisplayName
text
Body
I have two CUDA functions that manipulate linked lists in global memory. The function <code>pmalloc</code> removes the head element of one of the lists. It first chooses a list and then calls <code>pmallocBucket</code> which actually removes the head element. Should the chosen list be empty, <code>pmalloc</code> will try other lists. The <code>pfree</code> function on the other hand will insert a new head element into a list. Mutual exclusion is achieved through semaphores, one each for each linked list. The implementation for the semaphores is from the book CUDA By Example. In some other test code, the semaphore works perfectly. The problem I have with the code is the following: Sometimes, several threads will try to access the same linked list simultaneously. These accesses are succesfully sequentialized by the semaphore, but sometimes, a thread will remove the same head element from the list as a previous thread. This may happen immediately consecutively, or there can be one or more other threads in between. The thread will then <code>free</code> an unallocated memory area and my program crashes. Here are the mentioned functions. <code>mmd</code> is a structure in global memory that is initialized from another function. <pre><code>extern __device__ void wait(int* s) { while(atomicCAS(s, 0, 1) != 0); } extern __device__ void signal(int* s) { atomicExch(s, 0); } __device__ void pfree(Expression* node) { LinkedList* l = (LinkedList*) malloc(sizeof(LinkedList)); l->cell = node; node->type = EMPTY; node->funcidx = 0; node->name = NULL; node->len = 0; node->value = 0; node->numParams = 0; free(node->params); int targetBin = (blockIdx.x * mmd.bucketSize + threadIdx.x) / BINSIZE; /* * The for loop and subsequent if are necessary to make sure that only one * thread in a warp is actively waiting for the lock on the semaphore. * Leaving this out will result in massive headaches. * See "CUDA by example", p. 273 */ for(int i = 0; i < WARPSIZE; i++) { if(((threadIdx.x + blockIdx.x * blockDim.x) % WARPSIZE) == i) { wait(&mmd.bucketSemaphores[targetBin]); l->next = mmd.freeCells[targetBin]; mmd.freeCells[targetBin] = l; signal(&mmd.bucketSemaphores[targetBin]); } } } __device__ Expression* pmalloc() { Expression* retval = NULL; int i = 0; int bucket = (blockIdx.x * mmd.bucketSize + threadIdx.x) / BINSIZE; while(retval == NULL && i < mmd.numCellBins) { retval = pmallocBucket((i + bucket) % mmd.numCellBins); i++; } if(retval == NULL) { printf("(%u, %u) Out of memory\n", blockIdx.x, threadIdx.x); } return retval; } __device__ Expression* pmallocBucket(int bucket) { Expression* retval = NULL; if(bucket < mmd.numCellBins) { LinkedList* l = NULL; for(int i = 0; i < WARPSIZE; i++) { if(((threadIdx.x + blockIdx.x * blockDim.x) % WARPSIZE) == i) { wait(&mmd.bucketSemaphores[bucket]); l = mmd.freeCells[bucket]; if(l != NULL) { retval = l->cell; mmd.freeCells[bucket] = l->next; } signal(&mmd.bucketSemaphores[bucket]); free(l); } } } return retval; } </code></pre> I am quite at a loss. I do not know what is actually going wrong and all my attempts so far to clear it up have been unsuccesful. Any help is greatly appreciated. P. S.: Yes, I do realize that the use of atomic operations and semaphores is less than ideal for CUDA applications. But in this case, as of yet I have no idea how this could be implemented differently and my project is on an absolutely fixed deadline that is approaching really fast, so this will have to do.
Tags
<cuda><malloc><free><semaphore><atomic>
Title
CUDA atomic and non atomic memory access
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USSarek
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.