Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>You are invoking undefined behaviour. If you wish to access memory another thread in a workgroup is writing you must use barriers.</p> <p>In addition assume that the GPU is running 2 wavefronts at once. Then dataSet[65] indeed contains the correct value, the first wavefront has simply not been completed yet.</p> <p>Also the output of all items as 0 is also a valid result according to spec. It's because everything could also be performed completely serially. That's why you need the barriers.</p> <p>Based on your comments I edited this part:</p> <p>Install <a href="http://developer.amd.com/tools-and-sdks/heterogeneous-computing/codexl/" rel="nofollow">http://developer.amd.com/tools-and-sdks/heterogeneous-computing/codexl/</a> Read: <a href="http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf" rel="nofollow">http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf</a></p> <p>Optimizing branching within a certain amount of threads is only a small part of optimization. You should read on how AMD HW schedules the wavefronts within a workgroup and how it hides memory latency by interleaving the execution of wavefronts (within a workgroup). The branching also affects the execution of the whole workgroup as the effective time to run it is basically the same as the time to execute the single longest running wavefront (It cannot free local memory etc until everything in the group is finished so it cannot schedule another workgroup). But this also depends on your local memory and register usage etc. To see what actually happens just grab CodeXL and run GPU profiling run. That will show exactly what happens on the device.</p> <p>And even this applies only to just the hardware of current generation. That's why the concept is not on the OpenCL specification itself. These properties change a lot and depend a lot on the hardware.</p> <p>But if you really want to know just what is AMD wavefront size the answer is pretty much awlways 64 (See <a href="http://devgurus.amd.com/thread/159153" rel="nofollow">http://devgurus.amd.com/thread/159153</a> for reference to their OpenCL programming guide). It's 64 for all GCN devices which compose their whole current lineup. Maybe some older devices have 16 or 32, but right now everything is just 64 (for nvidia it's 32 in general).</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload