Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>In my own code, I use the <code>clock()</code> function to get precise timings. For convenience, I have the macros</p> <pre><code>enum { tid_this = 0, tid_that, tid_count }; __device__ float cuda_timers[ tid_count ]; #ifdef USETIMERS #define TIMER_TIC clock_t tic; if ( threadIdx.x == 0 ) tic = clock(); #define TIMER_TOC(tid) clock_t toc = clock(); if ( threadIdx.x == 0 ) atomicAdd( &amp;cuda_timers[tid] , ( toc &gt; tic ) ? (toc - tic) : ( toc + (0xffffffff - tic) ) ); #else #define TIMER_TIC #define TIMER_TOC(tid) #endif </code></pre> <p>These can then be used to instrument the device code as follows:</p> <pre><code>__global__ mykernel ( ... ) { /* Start the timer. */ TIMER_TIC /* Do stuff. */ ... /* Stop the timer and store the results to the "timer_this" counter. */ TIMER_TOC( tid_this ); } </code></pre> <p>You can then read the <code>cuda_timers</code> in the host code.</p> <p>A few notes:</p> <ul> <li>The timers work on a per-block basis, i.e. if you have 100 blocks executing the same kernel, the sum of all their times will be stored.</li> <li>Having said that, the timer assumes that the zeroth thread is active, so make sure you do not call these macros in a possibly divergent part of the code.</li> <li>The timers count the number of clock ticks. To get the number of milliseconds, divide this by the number of GHz on your device and multiply by 1000.</li> <li>The timers can slow down your code a bit, which is why I wrapped them in the <code>#ifdef USETIMERS</code> so you can switch them off easily.</li> <li>Although <code>clock()</code> returns integer values of type <code>clock_t</code>, I store the accumulated values as <code>float</code>, otherwise the values will wrap around for kernels that take longer than a few seconds (accumulated over all blocks).</li> <li>The selection <code>( toc &gt; tic ) ? (toc - tic) : ( toc + (0xffffffff - tic) ) )</code> is necessary in case the clock counter wraps around.</li> </ul> <p>P.S. This is a copy of my reply to <a href="https://stackoverflow.com/questions/11070298/timing-cuda-kernels/11070746#11070746">this question</a>, which didn't get many points there since the timing required was for the whole kernel.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload