Note that there are some explanatory texts on larger screens.

plurals
  1. POFor nested loops with CUDA
    primarykey
    data
    text
    <p>I'm having a problem with some for nested loops that I have to convert from C/C++ into CUDA. Basically I have 4 for nested loops which are sharing the same array and making bit shift operations.</p> <pre><code>#define N 65536 // ---------------------------------------------------------------------------------- int a1,a2,a3,a4, i1,i2,i3,i4; int Bit4CBitmapLookUp[16] = {0, 1, 3, 3, 7, 7, 7, 7, 15, 15, 15, 15, 15, 15, 15, 15}; int _cBitmapLookupTable[N]; int s = 0; // index into the cBitmapLookupTable for (i1 = 0; i1 &lt; 16; i1++) { // first customer a1 = Bit4CBitmapLookUp[i1] &lt;&lt; 12; for (i2 = 0; i2 &lt; 16; i2++) { // second customer a2 = Bit4CBitmapLookUp[i2] &lt;&lt; 8; for (i3 = 0; i3 &lt; 16; i3++) { // third customer a3 = Bit4CBitmapLookUp[i3] &lt;&lt; 4; for (i4 = 0;i4 &lt; 16;i4++) { // fourth customer a4 = Bit4CBitmapLookUp[i4]; // now actually set the sBitmapLookupTable value _cBitmapLookupTable[s] = a1 | a2 | a3 | a4; s++; } // for i4 } // for i3 } // for i2 } // for i1 </code></pre> <p>This is the code that I should convert into CUDA. I tried different ways but everytime i having the wrong output. Here i post my version of CUDA conversion (the piece from kernel's part)</p> <pre><code>#define N 16 //---------------------------------------------------------------------------------- // index for the GPU int i1 = blockDim.x * blockIdx.x + threadIdx.x; int i2 = blockDim.y * blockIdx.y + threadIdx.y; int i3 = i1; int i4 = i2; __syncthreads(); for(i1 = i2 = 0; i1 &lt; N, i2 &lt; N; i1++, i2++) { // first customer a1 = Bit4CBitmapLookUp_device[i1] &lt;&lt; 12; // second customer a2 = Bit4CBitmapLookUp_device[i2] &lt;&lt; 8; for(i3 = i4 = 0; i3 &lt; N, i4 &lt; N; i3++, i4++){ // third customer a3 = Bit4CBitmapLookUp_device[i3] &lt;&lt; 4; // fourth customer a4 = Bit4CBitmapLookUp_device[i4]; // now actually set the sBitmapLookupTable value _cBitmapLookupTable[s] = a1 | a2 | a3 | a4; s++; } } </code></pre> <p>I'm brand new in CUDA and I'm still learning, but really i can't find a solution for those for nested loops. Thank you in advance.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload