Note that there are some explanatory texts on larger screens.

plurals
  1. POCUDA atomic and non atomic memory access
    primarykey
    data
    text
    <p>I have two CUDA functions that manipulate linked lists in global memory. The function <code>pmalloc</code> removes the head element of one of the lists. It first chooses a list and then calls <code>pmallocBucket</code> which actually removes the head element. Should the chosen list be empty, <code>pmalloc</code> will try other lists. The <code>pfree</code> function on the other hand will insert a new head element into a list.</p> <p>Mutual exclusion is achieved through semaphores, one each for each linked list. The implementation for the semaphores is from the book <em>CUDA By Example</em>. In some other test code, the semaphore works perfectly.</p> <p>The problem I have with the code is the following: Sometimes, several threads will try to access the same linked list simultaneously. These accesses are succesfully sequentialized by the semaphore, but sometimes, a thread will remove the same head element from the list as a previous thread. This may happen immediately consecutively, or there can be one or more other threads in between. The thread will then <code>free</code> an unallocated memory area and my program crashes.</p> <p>Here are the mentioned functions. <code>mmd</code> is a structure in global memory that is initialized from another function.</p> <pre><code>extern __device__ void wait(int* s) { while(atomicCAS(s, 0, 1) != 0); } extern __device__ void signal(int* s) { atomicExch(s, 0); } __device__ void pfree(Expression* node) { LinkedList* l = (LinkedList*) malloc(sizeof(LinkedList)); l-&gt;cell = node; node-&gt;type = EMPTY; node-&gt;funcidx = 0; node-&gt;name = NULL; node-&gt;len = 0; node-&gt;value = 0; node-&gt;numParams = 0; free(node-&gt;params); int targetBin = (blockIdx.x * mmd.bucketSize + threadIdx.x) / BINSIZE; /* * The for loop and subsequent if are necessary to make sure that only one * thread in a warp is actively waiting for the lock on the semaphore. * Leaving this out will result in massive headaches. * See "CUDA by example", p. 273 */ for(int i = 0; i &lt; WARPSIZE; i++) { if(((threadIdx.x + blockIdx.x * blockDim.x) % WARPSIZE) == i) { wait(&amp;mmd.bucketSemaphores[targetBin]); l-&gt;next = mmd.freeCells[targetBin]; mmd.freeCells[targetBin] = l; signal(&amp;mmd.bucketSemaphores[targetBin]); } } } __device__ Expression* pmalloc() { Expression* retval = NULL; int i = 0; int bucket = (blockIdx.x * mmd.bucketSize + threadIdx.x) / BINSIZE; while(retval == NULL &amp;&amp; i &lt; mmd.numCellBins) { retval = pmallocBucket((i + bucket) % mmd.numCellBins); i++; } if(retval == NULL) { printf("(%u, %u) Out of memory\n", blockIdx.x, threadIdx.x); } return retval; } __device__ Expression* pmallocBucket(int bucket) { Expression* retval = NULL; if(bucket &lt; mmd.numCellBins) { LinkedList* l = NULL; for(int i = 0; i &lt; WARPSIZE; i++) { if(((threadIdx.x + blockIdx.x * blockDim.x) % WARPSIZE) == i) { wait(&amp;mmd.bucketSemaphores[bucket]); l = mmd.freeCells[bucket]; if(l != NULL) { retval = l-&gt;cell; mmd.freeCells[bucket] = l-&gt;next; } signal(&amp;mmd.bucketSemaphores[bucket]); free(l); } } } return retval; } </code></pre> <p>I am quite at a loss. I do not know what is actually going wrong and all my attempts so far to clear it up have been unsuccesful. Any help is greatly appreciated.</p> <p>P. S.: Yes, I do realize that the use of atomic operations and semaphores is less than ideal for CUDA applications. But in this case, as of yet I have no idea how this could be implemented differently and my project is on an absolutely fixed deadline that is approaching really fast, so this will have to do.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload