Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>In optimized code you run the kernel with blocks half as large as in the non-optimized implementation.</p> <p>Let's call the size of the block in non-optimized code <code>work</code>, let half of this size be called <code>unit</code>, and let these sizes have same numerical value for the optimized code as well. </p> <p>In non-optimized code you run the kernel with as many threads as the <code>work</code> is, that is <code>blockDim = 2 * unit</code>. The code in each block just copies part of <code>g_idata</code> to an array in shared memory, of size <code>2 * unit</code>. </p> <p>In the optimized code <code>blockDim = unit</code>, so there are now 1/2 of the threads, and the array in shared memory is 2x smaller. In line 3 first summand comes from even units, while second from odd units. In this way all the data required for reduction is taken into account. </p> <p>Example: If you run non-optimized kernel with <code>blockDim=256=work</code> (single block, <code>unit=128</code>), then optimized code has a single block of <code>blockDim=128=unit</code>. Since this block gets <code>blockIdx=0</code>, the <code>*2</code> does not matter; the first thread does <code>g_idata[0] + g_idata[0 + 128]</code>. </p> <p>If you had 512 elements, and run non-optimized with 2 blocks of size 256 (<code>work=256</code>, <code>unit=128</code>), then optimized code has 2 blocks, but now of size 128. The first thread in second block (<code>blockIdx=1</code>) does <code>g_idata[2*128] + g_idata[2*128+128]</code>. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload