Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>As <a href="https://stackoverflow.com/users/681865/talonmies">talonmies</a> pointed out, you're accessing and processing your data byte-wise, which is far from optimal. A collection of techniques you may want to consider, such as Instruction-Level Parallelism and buffered read/writes, are summarized in the nVidia Webinar <a href="http://nvidia.fullviewmedia.com/gtc2010/0922-a5-2238.html" rel="nofollow noreferrer">Better Performance at Lower Occupancy</a> by Vasily Volkov.</p> <p>In a nutshell, what you want to do is, in each thread, read <em>several</em> <code>uint4</code> in a coalesced way, process them, and only then store them.</p> <p><strong>Update</strong></p> <p>Does it make any difference if you re-write your code as follows?</p> <pre><code>__global__ void logical_and(unsigned int* in, unsigned int* out, int N) { int idx = blockIdx.x*blockDim.x*chunksize+threadIdx.x; unsigned int buff[chunksize]; #pragma unroll for ( int k = 0 ; k &lt; chunksize ; k++ ) buff[k] = in[ blockDim.x*k + idx ]; #pragma unroll for ( int k = 0 ; k &lt; chunksize ; k++ ) buff[k] &amp;= in[ blockDim.x*k + idx + N ]; #pragma unroll for ( int k = 0 ; k &lt; chunksize ; k++ ) out[ blockDim.x*k + idx ] = buff[k]; } </code></pre> <p>Note that I've assumed <code>chunksize</code> is a variable you've <code>#define</code>d somewhere, e.g.</p> <pre><code>#define chunksize 4 </code></pre> <p>And that you have to divide the number of blocks you launch and <code>N</code> by that number. I've also used <code>unsigned int</code> which is just four packed <code>uchar</code>. In your calling function, you may have to cast your pointers accordingly.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload