Note that there are some explanatory texts on larger screens.

plurals
  1. POOpenCL producing QNaN on NVidia hardware
    primarykey
    data
    text
    <p>I'm programming in OpenCL using the C++ bindings. I have a problem where on NVidia hardware, my OpenCL code is spontaneously producing very large numbers, and then (on the next run) a "1.#QNaN". My code is pretty much a simple physics simulation using the equation x = vt * .5at^2. The only thing I've noticed odd about it is that the velocities of the particles suddenly shoot up to about 6e+34, which I'm guessing is the maximum floating point value on that machine. However, the velocities/forces before then are quite small, often with values less than 1.</p> <p>The specific GPU I'm using is the Tesla M2050, with latest drivers. I prototype on my laptop using my AMD Fusion CPU as a platform (it does not have a dedicated GPU) and the problem does not occur there. I am not sure if this is an NVidia driver problem, a problem with my computation, or something else entirely.</p> <p>Here is the kernel code (<strong>Note: I'm reasonably sure mass is always nonzero</strong>):</p> <pre><code>_kernel void update_atom(__global float4 *pos, __global float4 *vel, __global float4 *force, __global const float *mass, __global const float *radius, const float timestep, const int wall) { // Get the index of the current element to be processed int i = get_global_id(0); float constMult; float4 accel; float4 part; //Update the position, velocity and force accel = (float4)(force[i].x/mass[i], force[i].y/mass[i], force[i].z/mass[i], 0.0f); constMult = .5*timestep*timestep; part = (float4)(constMult*accel.x, constMult*accel.y, constMult*accel.z, 0.0f); pos[i] = (float4)(pos[i].x + vel[i].x*timestep + part.x, pos[i].y + vel[i].y*timestep + part.y, pos[i].z + vel[i].z*timestep + part.z, 0.0f); vel[i] = (float4)(vel[i].x + accel.x*timestep, vel[i].y + accel.y*timestep, vel[i].z + accel.z*timestep, 0.0f); force[i] = (float4)(force[i].x, force[i].y, force[i].z, 0.0f); //Do reflections off the wall //http://www.3dkingdoms.com/weekly/weekly.php?a=2 float4 norm; float bouncePos = wall - radius[i]; float bounceNeg = radius[i] - wall; norm = (float4)(0.0f, 0.0f, 0.0f, 0.0f); if(pos[i].x &gt;= bouncePos) { //Normal is unit YZ vector pos[i].x = bouncePos; norm.x = 1.0f; } else if(pos[i].x &lt;= bounceNeg) { pos[i].x = bounceNeg; norm.x = -1.0f; } if(pos[i].y &gt;= bouncePos) { //Normal is unit XZ vector pos[i].y = bouncePos; norm.y = 1.0f; } else if(pos[i].y &lt;= bounceNeg) { //etc pos[i].y = bounceNeg; norm.y = -1.0f; } if(pos[i].z &gt;= bouncePos) { pos[i].z = bouncePos; norm.z = 1.0f; } else if(pos[i].z &lt;= bounceNeg) { pos[i].z = bounceNeg; norm.z = -1.0f; } float dot = 2 * (vel[i].x * norm.x + vel[i].y * norm.y + vel[i].z * norm.z); vel[i].x = vel[i].x - dot * norm.x; vel[i].y = vel[i].y - dot * norm.y; vel[i].z = vel[i].z - dot * norm.z; } </code></pre> <p>And here's how I store the information into the kernel. PutData just uses the std::vector::push_back function on the positions, velocities and forces of the atoms into the corresponding vectors, and kernel is just a wrapper class I wrote for the OpenCL libraries (you can trust me that I put the right parameters into the right places for the enqueue functions).</p> <pre><code>void LoadAtoms(int kernelNum, bool blocking) { std::vector&lt;cl_float4&gt; atomPos; std::vector&lt;cl_float4&gt; atomVel; std::vector&lt;cl_float4&gt; atomForce; for(int i = 0; i &lt; numParticles; i++) atomList[i].PutData(atomPos, atomVel, atomForce); kernel.EnqueueWriteBuffer(kernelNum, posBuf, blocking, 0, numParticles*sizeof(cl_float4), &amp;atomPos[0]); kernel.EnqueueWriteBuffer(kernelNum, velBuf, blocking, 0, numParticles*sizeof(cl_float4), &amp;atomVel[0]); kernel.EnqueueWriteBuffer(kernelNum, forceBuf, blocking, 0, numParticles*sizeof(cl_float4), &amp;atomForce[0]); } void LoadAtomTypes(int kernelNum, bool blocking) { std::vector&lt;cl_float&gt; mass; std::vector&lt;cl_float&gt; radius; int type; for(int i = 0; i &lt; numParticles; i++) { type = atomList[i].GetType(); mass.push_back(atomTypes[type].mass); radius.push_back(atomTypes[type].radius); } kernel.EnqueueWriteBuffer(kernelNum, massBuf, blocking, 0, numParticles*sizeof(cl_float), &amp;mass[0]); kernel.EnqueueWriteBuffer(kernelNum, radiusBuf, blocking, 0, numParticles*sizeof(cl_float), &amp;radius[0]); } </code></pre> <p>There is more to my code, as always, but this is what's related to the kernel.</p> <p>I saw <a href="https://stackoverflow.com/questions/6591144/opencl-producing-incorrect-calculations">this question</a>, which is similar, but I use cl_float4 everywhere I can so I don't believe it's an issue of alignment. There aren't really any other related questions.</p> <p>I realize this probably isn't a simple question, but I've run out of ideas until we can get new hardware in the office to test on. Can anyone help me out?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload