Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>In the main function, the elements with index greater than <code>blockDim.x*(gridDim.x-1)+(blockDim.x-1)</code> will be considered into calculation while in the method you've provided it doesn't happen.</p> <p>Suppose you have <code>N=1024</code>, and you invoke your function with a grid having 8 blocks each of them with 32 threads. In your main function, thread <code>i</code> will collect and add up data belonging to <code>*input</code> at elements <code>i</code>, <code>i+8*32</code>,<code>i+2*(8*32)</code>, <code>i+3*(8*32)</code>. On the other hand, your code collects data only at element <code>i</code>. In other words, it adds up only <code>32*8</code> first elements of <code>*input</code> and ignores <code>1024-32*8</code> rest.</p> <p><strong>In more detail:</strong></p> <p>code exmaple 1 works like this:</p> <pre><code>blockIdx.x=0 threadIdx.x=0 for ( i = 0; i &lt; 1024; i += 32*8 ) temp += input[i]; // temp= input[0]+input[256]+input[512]+input[768] buffer[0] = temp; //=input[0]+input[256]+input[512]+input[768] blockIdx.x=0 threadIdx.x=1 for ( i = 1; i &lt; 1024; i += 32*8 ) temp += input[i]; // temp= input[1]+input[257]+input[513]+input[769] buffer[1] = temp; //=input[1]+input[257]+input[513]+input[769] blockIdx.x=0 threadIdx.x=2 for ( i = 2; i &lt; 1024; i += 32*8 ) temp += input[i]; // temp= input[2]+input[258]+input[514]+input[770] buffer[2] = temp; //=input[2]+input[258]+input[514]+input[770] </code></pre> <p>...</p> <pre><code>//last thread blockIdx.x=7 threadIdx.x=31 for ( i = 7*32+31; i &lt; 1024; i += 32*8 ) temp += input[i]; // temp= input[255]+input[511]+input[767]+input[1023] buffer[255] = temp; //=input[255]+input[511]+input[767]+input[1023] </code></pre> <p>code exmaple 2 works like this:</p> <pre><code>blockIdx.x=0 threadIdx.x=0 i = 0*32+0; //=0 buffer[0] = input[0]; blockIdx.x=0 threadIdx.x=1 i = 0*32+1; //=1 buffer[1] = input[1]; </code></pre> <p>....</p> <pre><code>//last thread blockIdx.x=7 threadIdx.x=31 i = 7*32+31; //=255 buffer[255] = input[255]; </code></pre> <p>As you can see first code example, goes over all elements of <code>input</code> array but second code exmaple doesn't.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload