Note that there are some explanatory texts on larger screens.

plurals
  1. POmoving elements between arrays in a CUDA kernel
    primarykey
    data
    text
    <p>I am stuck in a very simple thing and I need an opinion. I have a very simple kernel in CUDA that copies the elements between two arrays (there is a reason I want to do it in this way) and </p> <pre><code>__global__ void kernelExample( float* A, float* B, float* C, int rows, int cols ) { int r = blockIdx.y * blockDim.y + threadIdx.y; // vertical dim in block int c = blockIdx.x * blockDim.x + threadIdx.x; // horizontal dim in block if ( r &lt; rows &amp;&amp; c &lt; cols) { // row-major order C[ c + r*cols ] = A[ c + r*cols ]; } //__syncthreads(); } </code></pre> <p>I am taking unsatisfying results. Any suggestions please?</p> <p>The kernel is called like this: </p> <pre><code>int numElements = rows * cols; int threadsPerBlock = 256; int blocksPerGrid = ceil( (double) numElements / threadsPerBlock); kernelExample&lt;&lt;&lt;blocksPerGrid , threadsPerBlock &gt;&gt;&gt;( d_A, d_B, d_C, rows, cols ); </code></pre> <p><strong>Updated</strong>(After Eric's help):</p> <pre><code>int numElements = rows * cols; int threadsPerBlock = 32; //talonmies comment int blocksPerGrid = ceil( (double) numElements / threadsPerBlock); dim3 dimBlock( threadsPerBlock,threadsPerBlock ); dim3 dimGrid( blocksPerGrid,blocksPerGrid ); kernelExample&lt;&lt;&lt;dimBlock, dimBlock&gt;&gt;&gt;( d_A, d_B, d_C, rows, cols ); </code></pre> <p>For example having the matrix A </p> <pre><code>A =[ 0 1 2 1 0 2 0 0 2 0 0 1 2 1 2 2 2 2 0 0 2 1 2 2 3 1 2 2 2 2 ] </code></pre> <p>the returned matrix C is</p> <pre><code>C = [ 0 1 2 1 0 2 0 0 2 0 0 1 2 1 2 2 2 2 0 0 2 1 2 2 3 1 2 2 2 2 ] </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload