Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>While there is no official answer to this, you can measure through atomic operations when your blocks begin your work and when they end.</p> <p>Try playing with the following code:</p> <pre><code>#include &lt;stdio.h&gt; const int maxBlocks=60; //Number of blocks of size 512 threads on current device required to achieve full occupancy __global__ void emptyKernel() {} __global__ void myKernel(int *control, int *output) { if (threadIdx.x==1) { //register that we enter int enter=atomicAdd(control,1); output[blockIdx.x]=enter; //some intensive and long task int &amp;var=output[blockIdx.x+gridDim.x]; //var references global memory var=1; for (int i=0; i&lt;12345678; ++i) { var+=1+tanhf(var); } //register that we quit var=atomicAdd(control,1); } } int main() { int *gpuControl; cudaMalloc((void**)&amp;gpuControl, sizeof(int)); int cpuControl=0; cudaMemcpy(gpuControl,&amp;cpuControl,sizeof(int),cudaMemcpyHostToDevice); int *gpuOutput; cudaMalloc((void**)&amp;gpuOutput, sizeof(int)*maxBlocks*2); int cpuOutput[maxBlocks*2]; for (int i=0; i&lt;maxBlocks*2; ++i) //clear the host array just to be on the safe side cpuOutput[i]=-1; // play with these values const int thr=479; const int p=13; const int q=maxBlocks; //I found that this may actually affect the scheduler! Try with and without this call. emptyKernel&lt;&lt;&lt;p,thr&gt;&gt;&gt;(); cudaEvent_t timerStart; cudaEvent_t timerStop; cudaEventCreate(&amp;timerStart); cudaEventCreate(&amp;timerStop); cudaThreadSynchronize(); cudaEventRecord(timerStart,0); myKernel&lt;&lt;&lt;q,512&gt;&gt;&gt;(gpuControl, gpuOutput); cudaEventRecord(timerStop,0); cudaEventSynchronize(timerStop); cudaMemcpy(cpuOutput,gpuOutput,sizeof(int)*maxBlocks*2,cudaMemcpyDeviceToHost); cudaThreadSynchronize(); float thisTime; cudaEventElapsedTime(&amp;thisTime,timerStart,timerStop); cudaEventDestroy(timerStart); cudaEventDestroy(timerStop); printf("Elapsed time: %f\n",thisTime); for (int i=0; i&lt;q; ++i) printf("%d: %d-%d\n",i,cpuOutput[i],cpuOutput[i+q]); } </code></pre> <p>What you get in the output is the block ID, followed by the enter "time" and exit "time". This way you can learn in which order those events occured.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload