Note that there are some explanatory texts on larger screens.

plurals
  1. POcuda shared memory overwrite?
    primarykey
    data
    text
    <p>I am trying to write a parallel prefix scan on cuda by following this <a href="http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html" rel="nofollow">tutorial -</a> </p> <p>I am trying the work-inefficient 'double buffered one' as explained in the tutorial.</p> <p>This is what I have:</p> <pre><code>// double buffered naive. // d = number of iterations, N - size, and input. __global__ void prefixsum(int* in, int d, int N) { //get the block index int idx = blockIdx.x*blockDim.x + threadIdx.x; // allocate shared memory extern __shared__ int temp_in[], temp_out[]; // copy data to it. temp_in[idx] = in[idx]; temp_out[idx] = 0; // block until all threads copy __syncthreads(); int i = 1; for (i; i&lt;=d; i++) { if (idx &lt; N+1 &amp;&amp; idx &gt;= (int)pow(2.0f,(float)i-1)) { // copy new result to temp_out temp_out[idx] += temp_in[idx - (int)pow(2.0f,(float)i-1)] + temp_in[idx]; } else { // if the element is to remain unchanged, copy the same thing temp_out[idx] = temp_in[idx]; } // block until all theads do this __syncthreads(); // copy the result to temp_in for next iteration temp_in[idx] = temp_out[idx]; // wait for all threads to do so __syncthreads(); } //finally copy everything back to global memory in[idx] = temp_in[idx]; } </code></pre> <p>Can you point out what's wrong with this? I have written comments for what I think should happen.</p> <p>This is the kernel invocation - </p> <pre><code>prefixsum&lt;&lt;&lt;dimGrid,dimBlock&gt;&gt;&gt;(d_arr, log(SIZE)/log(2), N); </code></pre> <p>This is the grid and block allocations:</p> <pre><code>dim3 dimGrid(numBlocks); dim3 dimBlock(numThreadsPerBlock); </code></pre> <p>The problem is that I don't get the correct output for any input that's more than 8 elements long. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload