Note that there are some explanatory texts on larger screens.

plurals
  1. POUsing __sync CUDA with global memory
    primarykey
    data
    text
    <p>I am trying implements an asynchronous PSO. My approach to do this was the following: </p> <pre><code>__global__ void particle(double *pos, double *pbest, double *vpbest, double *vel, double *gbest){ int thread = threadIdx.x + blockDim.x * blockIdx.x; int particle, i = 0; double tpbest; double l, r; int index, best, j; if(thread &lt; DIMPAR){ particle = thread / NDIM; do{ best = ring(vpbest, &amp;particle); index = (best * NDIM) + (thread % NDIM); l = (double) 2.05 * (double) uniform(thread) * ( pbest[thread] - pos[thread] ); r = (double) 2.05 * (double) uniform(thread) * ( pbest[index] - pos[thread] ); vel[thread] = vel[thread] + l + r; pos[thread] = pos[thread] + vel[thread]; __syncthreads(); // I am trying wait all threads write in global memory if( (thread % NDIM) == 0 ){ //only one thread replace the vector tpbest = rastrigin(pos, particle * NDIM, NDIM); if(tpbest &lt; vpbest[particle]){ vpbest[particle] = tpbest; for(j = 0 ; j &lt; NDIM; j++){ pbest[(particle * NDIM) + j] = pos[(particle * NDIM) + j]; } } } i++; }while(i &lt; 10000); } } </code></pre> <p>the call:</p> <pre><code>particle&lt;&lt;&lt;1,512&gt;&gt;&gt;(d_pos, d_pbest, d_vpbest, d_velo, d_gbest); </code></pre> <p>Sometimes there is a problem with sync...some values in pos[thread] diverges. In section B.6 CUDA_C_PROGRAMMING GUIDE:</p> <blockquote> <p>waits until all threads in the thread block have reached this point and all <strong>global</strong> and shared memory accesses made by these threads prior to __syncthreads() are visible to all threads in the block.</p> </blockquote> <p>pos vector it's like this:</p> <p>p0 = [0,1,2] //particle 1</p> <p>p1 = [3,4,5] //particle 2</p> <p>p2 = [6,7,8] //particle 3</p> <p>pos = [1,2,3,4,5,6,7,8] //pos vector, DIMPAR = 9; NPAR = 3; NDIM = 3</p> <p>when I use NDIM >= 30 the divergence happen </p> <p>how to ensure the sync using a global memory?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. CO`__syncthreads()` guarantees that previous updates to global or shared memory are visible to all threads in the block. It's not clear to me why you attribute the numerical divergence to a problem with `__syncthreads()`. How do you know the numerical divergence is not arising out of your arithmetic, just as it did in your [previous question](http://stackoverflow.com/questions/18829412/infinity-as-result-in-double-operation)?
      singulars
    2. CO@RobertCrovella, thank you. When loop is over I use the pos vector in a function called rastrigin in C code (validated function) and compare with vpbest[particle] value. In this point the values diverges. So, if I use the same values always I will receive the same value. It's like there are a async between pos vector and vpbest...thank you so much for your help
      singulars
    3. COThe bug could be anywhere, in my opinion. For questions like this, SO expects: "Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance. " You haven't provided an SSCCE.org code, and your reasons to suspect the `__syncthreads()` function seem obscure to me at best. Voting to close. Note that I'm not suggesting you dump a whole bunch of code in here. If the problem is as you describe, it should be possible to create a simple reproducer.
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload