Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>If your pairs of matrices are stored contiguously, and if you are accessing the elements linearly by thread index, then you will not have shared memory bank conflicts.</p> <p>In other words if you have:</p> <pre><code>A[0] &lt;- mat1 element1 A[1] &lt;- mat1 element2 A[2] &lt;- mat1 element3 A[15] &lt;- mat1 element16 A[16] &lt;- mat2 element1 A[17] &lt;- mat2 element2 A[33] &lt;- mat2 element18 </code></pre> <p>And you access this using:</p> <pre><code>float element; element = A[pairindex * 34 + matindex * 16 + threadIdx.x]; </code></pre> <p>Then adjacent threads are accessing adjacent elements in the matrix and you do not have conflicts.</p> <p>In response to your comments (below) it does seem that you are mistaken in your understanding. It is true that there are 16 banks (in current generations, 32 in the next generation, Fermi) but consecutive 32-bit words reside in consecutive banks, i.e. the address space is interleaved across the banks. This means that provided you always have an array index that can be decomposed to <code>x + threadIdx.x</code> (where <code>x</code> is not dependent on threadIdx.x, or at least is constant across groups of 16 threads) you will not have bank conflicts.</p> <p>When you access the matrices further along the array, you still access them in a contiguous chunk and hence you will not have bank conflicts. It is only when you start accessing non-adjacent elements that you will have bank conflicts.</p> <p>The <em>reduction</em> sample in the SDK illustrates bank conflicts very well by building from a naive implementation to an optimised implementation, possibly worth taking a look.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload