Note that there are some explanatory texts on larger screens.

plurals
  1. POCUDA threads appear to be out of sync
    primarykey
    data
    text
    <p>I have an issue where it appears that a single thread is trailing behind the rest, even though i'm using syncthreads. The following extract is taken from a large program, where I've cut out as much as I can yet it still reproduces my problem. What I find is that upon running this code the test4 variable does not return the same value for all threads. My understanding is that using the TEST_FLAG variable it should lead all threads into the <code>if (TEST_FLAG == 2)</code> condition and therefore every element in the array test4 should return a value of 43. However what I find is that all elements return 43, except thread 0 which returns 0. It appears as if the threads are not all getting to the same syncthreads. I've performed numerous tests and I've found that removing more of the code, such as the <code>for (l=0; l&lt;1; ++l)</code> loop resolves the issue, but I do not understand why. Any help as to why my threads are not all returning the same value would be greatly appreciated.</p> <pre><code>import numpy as np import pycuda.driver as drv import pycuda.compiler import pycuda.autoinit import pycuda.gpuarray as gpuarray import pycuda.cumath as cumath from pycuda.compiler import SourceModule gpu_code=SourceModule(""" __global__ void test_sync(double *test4, double *test5) { __shared__ double rad_loc[2], boundary[2], boundary_limb_edge[2]; __shared__ int TEST_FLAG; int l; if (blockIdx.x != 0) { return; } if(threadIdx.x == 0) { TEST_FLAG = 2; boundary[0] = 1; } test4[threadIdx.x] = 0; test5[threadIdx.x] = 0; if (threadIdx.x == 0) { rad_loc[0] = 0.0; } __syncthreads(); for (l=0; l&lt;1; ++l) { __syncthreads(); if (rad_loc[0] &gt; 0.0) { test5[threadIdx.x] += 1; if ((int)boundary[0] == -1) { __syncthreads(); continue; } } else { if (threadIdx.x == 0) { boundary_limb_edge[0] = 0.0; } } __syncthreads(); if (TEST_FLAG == 2) { test4[threadIdx.x] = 43; __syncthreads(); TEST_FLAG = 99; } __syncthreads(); return; } return; } """) test_sync = gpu_code.get_function("test_sync") DATA_ROWS=[100,100] blockshape_data_mags = (int(64),1, 1) gridshape_data_mags = (int(sum(DATA_ROWS)), 1) test4 = np.zeros([1*blockshape_data_mags[0]], np.float64) test5 = np.zeros([1*blockshape_data_mags[0]], np.float64) test_sync(drv.InOut(test4), drv.InOut(test5), block=blockshape_data_mags, grid=gridshape_data_mags) print test4 print test5 </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. CO__synchthreads() might not have user-defined behavior inside conditional statements, even if all threads are in the same block of code. An example was given by http://stackoverflow.com/questions/12519573/cuda-syncthreads-inside-if-statements
      singulars
    2. COI was under the impression that it would only be problematic if all threads did not follow the same branch, as in the example you linked. Where as in my example all threads follow the same path, so I wouldn't expect any issues with the syncthreads. From [B.6. Synchronization Functions of the CUDA programming guide](http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf) __syncthreads() is allowed in conditional code but only if the conditional evaluates identically across the entire thread block, otherwise the code execution is likely to hang or produce unintended side effects
      singulars
    3. COCould you please provide a complete reproducer, rather than just the kernel? I have no idea what your launch configuration is, and other details, and I'd rather not guess. This is definitely wierd code. For example, you are using the value of `boundary[0]` in a conditional before you are initializing it. Note that SO expects: "Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance."
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload