Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You write that you want to avoid branching on the GPU. It is true, that branching can be very costly in a parallel environment because either both branches have to be evaluated or synchronization has to be applied. But if the branches are small enough the code will be faster than most arithmetic. The <a href="http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Best_Practices_Guide.pdf" rel="nofollow">CUDA C best practices guide</a> describes why:</p> <blockquote> <p>Sometimes, the compiler may [..] optimize out if or switch statements by using branch predication instead. In these cases, no warp can ever diverge. [..]</p> <p>When using branch predication none of the instructions whose execution depends on the controlling condition gets skipped. Instead, each of them is associated with a per-thread condition code or predicate that is set to true or false based on the controlling condition and although each of these instructions gets scheduled for execution, only the instructions with a true predicate are actually executed. Instructions with a false predicate do not write results, and also do not evaluate addresses or read operands.</p> </blockquote> <p>Branch predication is fast. Bloody fast! If you look at the intermediate PTX code generated by the optimizing compiler you will see that it is superior to even modest arithmetic. So the code like in the answer of davmac is probably as fast as it can get.</p> <p>I know you did not ask specifically about CUDA, but most of the best practices guide also applies to OpenCL and probably large parts of AMDs GPU programming.</p> <p>BTW: in virtually every case of GPU code I have ever seen most of the time is spend on memory access, not on arithmetic. Make sure to profile! <a href="http://en.wikipedia.org/wiki/Program_optimization" rel="nofollow">http://en.wikipedia.org/wiki/Program_optimization</a></p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload