StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POOpenCL producing QNaN on NVidia hardware
primarykey
Id
10839379
data
AcceptedAnswerId
10874707
AnswerCount
1
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2012-05-31T18:42:43.007
FavoriteCount
0
LastActivityDate
2012-06-03T23:13:16.380
LastEditDate
2017-05-23T11:56:29.837
LastEditorUserId
-1
OwnerUserId
1110292
ParentId
0
PostTypeId
1
Score
1
ViewCount
267
LastEditorDisplayName
text
Body
I'm programming in OpenCL using the C++ bindings. I have a problem where on NVidia hardware, my OpenCL code is spontaneously producing very large numbers, and then (on the next run) a "1.#QNaN". My code is pretty much a simple physics simulation using the equation x = vt * .5at^2. The only thing I've noticed odd about it is that the velocities of the particles suddenly shoot up to about 6e+34, which I'm guessing is the maximum floating point value on that machine. However, the velocities/forces before then are quite small, often with values less than 1. The specific GPU I'm using is the Tesla M2050, with latest drivers. I prototype on my laptop using my AMD Fusion CPU as a platform (it does not have a dedicated GPU) and the problem does not occur there. I am not sure if this is an NVidia driver problem, a problem with my computation, or something else entirely. Here is the kernel code (Note: I'm reasonably sure mass is always nonzero): <pre><code>_kernel void update_atom(__global float4 *pos, __global float4 *vel, __global float4 *force, __global const float *mass, __global const float *radius, const float timestep, const int wall) { // Get the index of the current element to be processed int i = get_global_id(0); float constMult; float4 accel; float4 part; //Update the position, velocity and force accel = (float4)(force[i].x/mass[i], force[i].y/mass[i], force[i].z/mass[i], 0.0f); constMult = .5*timestep*timestep; part = (float4)(constMult*accel.x, constMult*accel.y, constMult*accel.z, 0.0f); pos[i] = (float4)(pos[i].x + vel[i].x*timestep + part.x, pos[i].y + vel[i].y*timestep + part.y, pos[i].z + vel[i].z*timestep + part.z, 0.0f); vel[i] = (float4)(vel[i].x + accel.x*timestep, vel[i].y + accel.y*timestep, vel[i].z + accel.z*timestep, 0.0f); force[i] = (float4)(force[i].x, force[i].y, force[i].z, 0.0f); //Do reflections off the wall //http://www.3dkingdoms.com/weekly/weekly.php?a=2 float4 norm; float bouncePos = wall - radius[i]; float bounceNeg = radius[i] - wall; norm = (float4)(0.0f, 0.0f, 0.0f, 0.0f); if(pos[i].x >= bouncePos) { //Normal is unit YZ vector pos[i].x = bouncePos; norm.x = 1.0f; } else if(pos[i].x <= bounceNeg) { pos[i].x = bounceNeg; norm.x = -1.0f; } if(pos[i].y >= bouncePos) { //Normal is unit XZ vector pos[i].y = bouncePos; norm.y = 1.0f; } else if(pos[i].y <= bounceNeg) { //etc pos[i].y = bounceNeg; norm.y = -1.0f; } if(pos[i].z >= bouncePos) { pos[i].z = bouncePos; norm.z = 1.0f; } else if(pos[i].z <= bounceNeg) { pos[i].z = bounceNeg; norm.z = -1.0f; } float dot = 2 * (vel[i].x * norm.x + vel[i].y * norm.y + vel[i].z * norm.z); vel[i].x = vel[i].x - dot * norm.x; vel[i].y = vel[i].y - dot * norm.y; vel[i].z = vel[i].z - dot * norm.z; } </code></pre> And here's how I store the information into the kernel. PutData just uses the std::vector::push_back function on the positions, velocities and forces of the atoms into the corresponding vectors, and kernel is just a wrapper class I wrote for the OpenCL libraries (you can trust me that I put the right parameters into the right places for the enqueue functions). <pre><code>void LoadAtoms(int kernelNum, bool blocking) { std::vector<cl_float4> atomPos; std::vector<cl_float4> atomVel; std::vector<cl_float4> atomForce; for(int i = 0; i < numParticles; i++) atomList[i].PutData(atomPos, atomVel, atomForce); kernel.EnqueueWriteBuffer(kernelNum, posBuf, blocking, 0, numParticles*sizeof(cl_float4), &atomPos[0]); kernel.EnqueueWriteBuffer(kernelNum, velBuf, blocking, 0, numParticles*sizeof(cl_float4), &atomVel[0]); kernel.EnqueueWriteBuffer(kernelNum, forceBuf, blocking, 0, numParticles*sizeof(cl_float4), &atomForce[0]); } void LoadAtomTypes(int kernelNum, bool blocking) { std::vector<cl_float> mass; std::vector<cl_float> radius; int type; for(int i = 0; i < numParticles; i++) { type = atomList[i].GetType(); mass.push_back(atomTypes[type].mass); radius.push_back(atomTypes[type].radius); } kernel.EnqueueWriteBuffer(kernelNum, massBuf, blocking, 0, numParticles*sizeof(cl_float), &mass[0]); kernel.EnqueueWriteBuffer(kernelNum, radiusBuf, blocking, 0, numParticles*sizeof(cl_float), &radius[0]); } </code></pre> There is more to my code, as always, but this is what's related to the kernel. I saw <a href="https://stackoverflow.com/questions/6591144/opencl-producing-incorrect-calculations">this question</a>, which is similar, but I use cl_float4 everywhere I can so I don't believe it's an issue of alignment. There aren't really any other related questions. I realize this probably isn't a simple question, but I've run out of ideas until we can get new hardware in the office to test on. Can anyone help me out?
Tags
<c++><opencl><gpgpu>
Title
OpenCL producing QNaN on NVidia hardware
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USCommunity
UserOwnerUserId
1. USChaosed0
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.