Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Banks are set up such that each successive 32 bits are in the next bank. So, if you declare an array of 4 byte floats, each subsequent float in the array will be in the next bank (modulo 16 or 32, depending on your architecture). I'll assume you're on compute capability 1.x, so you have a bank of width 16.</p> <p>If you have arrays of 18 and 16, things can be funny. You can avoid bank conflicts in the 16x16 array by declaring it like</p> <pre><code>__shared__ float sixteen[16][16+1] </code></pre> <p>which avoids bank conflicts when accessing transpose elements using threadIdx.x (as I assume you're doing if you're getting conflicts). When accessing elements in, say, the first row of a 16x16 matrix, they'll all reside in the 1st bank. What you want to do is have each of these in a successive bank. Padding does this for you. You treat the array exactly as you would before, as sixteen[row][column], or similarly for a flattened matrix, as sixteen[row*(16+1)+column], if you want.</p> <p>For the 18x18 case, when accessing in the transpose, you're moving at an even stride. The answer again is to pad by 1.</p> <pre><code>__shared__ float eighteens[18][18+1] </code></pre> <p>So now, when you access in the transpose (say accessing elements in the first column), it will access as (18+1)%16 = 3, and you'll access banks 3, 6, 9, 12, 15, 2, 5, 8 etc, so you should get no conflicts.</p> <p>The particular alignment shift due to having a matrix of size 18 isn't the problem, because the starting point of the array makes no difference, it's only the order in which you access it. If you want to flatten the arrays I've proposed above, and merge them into 1, that's fine, as long as you access them in a similar fashion.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload