Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>A1. Optimize the threads per block is basically heuristics. You could try </p> <pre><code>for(int threadsPerBlock=32; threadsPerBlock&lt;=512;threadsPerBlock+=32){...} </code></pre> <p>A2. Currently you use one thread per row and sum the elements to <code>squareEuclDist</code> linearly. You could consider use one thread block per row. Within the block, each thread computes the square-difference of one element and you could use parallel reduction to sum them together. Please refer to the following link for parallel reduction.</p> <p><a href="http://docs.nvidia.com/cuda/samples/6_Advanced/reduction/doc/reduction.pdf" rel="nofollow">http://docs.nvidia.com/cuda/samples/6_Advanced/reduction/doc/reduction.pdf</a></p> <p>A3. the list you show is the total amount of global/shared memory. Multiple threads will share these hardware resources. You could find this tool in your cuda installation dir to help you calculate the exact number per thread of those hardware resources you can use in a particular kernel.</p> <pre><code>$CUDA_HOME/tools/CUDA_Occupancy_Calculator.xls </code></pre> <p>A4. <code>maximum sizes of each dimension</code> does not mean all dimensions can reach their max at the same time. However there's no limitation on block per grid, so 65536x65536x1 blocks in a grid is possible.</p> <p>A5. mem clock has nothing to do with the thread number. You could read the programming model section in cuda doc for more info.</p> <p><a href="http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#scalable-programming-model" rel="nofollow">http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#scalable-programming-model</a> </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload