Note that there are some explanatory texts on larger screens.

plurals
  1. POthe Kernel delay increase by increasing the blocksPerGrid and threadsPerBlock in CUDA VecAdd example, what does it mean?
    primarykey
    data
    text
    <p>when i tested the following example, i found that by increasing the blocksPerGrid and threadsPerBlock the Kernel delay increase</p> <p>such that if </p> <pre><code>int threadsPerBlock = 1; int blocksPerGrid = 1; </code></pre> <p>blocksPerGrid and threadsPerBlock equal 1 the delay of the kernel = .0072 ms</p> <p>but when i make the following it the delay become higher = .049 ms</p> <pre><code>int threadsPerBlock = 1024; int blocksPerGrid = (N+threadsPerBlock-1) / threadsPerBlock; </code></pre> <p>where</p> <pre><code>N = 50000; //the no. of array elements </code></pre> <p>on the following the complete VecAdd example. you can test it</p> <pre><code>// Includes #include &lt;stdio.h&gt; #include &lt;cutil_inline.h&gt; #include &lt;shrQATest.h&gt; // Variables float* h_A; float* h_B; float* h_C; float* d_A; float* d_B; float* d_C; bool noprompt = false; // Functions void CleanupResources(void); void RandomInit(float*, int); void ParseArguments(int, char**); // Device code __global__ void VecAdd(const float* A, const float* B, float* C, int N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i &lt; N) C[i] = A[i] + B[i]; } // Host code int main(int argc, char** argv) { shrQAStart(argc, argv); cudaEvent_t event1, event2; cudaEventCreate(&amp;event1); cudaEventCreate(&amp;event2); printf("Vector Addition\n"); int N = 50000; size_t size = N * sizeof(float); ParseArguments(argc, argv); // Allocate input vectors h_A and h_B in host memory h_A = (float*)malloc(size); if (h_A == 0) CleanupResources(); h_B = (float*)malloc(size); if (h_B == 0) CleanupResources(); h_C = (float*)malloc(size); if (h_C == 0) CleanupResources(); // Initialize input vectors RandomInit(h_A, N); RandomInit(h_B, N); // Allocate vectors in device memory cutilSafeCall( cudaMalloc((void**)&amp;d_A, size) ); cutilSafeCall( cudaMalloc((void**)&amp;d_B, size) ); cutilSafeCall( cudaMalloc((void**)&amp;d_C, size) ); // Copy vectors from host memory to device memory cutilSafeCall( cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice) ); cutilSafeCall( cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice) ); // Invoke kernel int threadsPerBlock = 1024; int blocksPerGrid = (N+threadsPerBlock-1) / threadsPerBlock; cudaEventRecord(event1, 0); VecAdd&lt;&lt;&lt;blocksPerGrid, threadsPerBlock&gt;&gt;&gt;(d_A, d_B, d_C, N); cudaEventRecord(event2, 0); cudaEventSynchronize(event1); //optional cudaEventSynchronize(event2); float dt_ms; cudaEventElapsedTime(&amp;dt_ms, event1, event2); printf("delay_time = %f\n", dt_ms); cutilCheckMsg("kernel launch failure"); #ifdef _DEBUG cutilSafeCall( cutilDeviceSynchronize() ); #endif // Copy result from device memory to host memory // h_C contains the result in host memory cutilSafeCall( cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost) ); // Verify result int i; for (i = 0; i &lt; N; ++i) { float sum = h_A[i] + h_B[i]; if (fabs(h_C[i] - sum) &gt; 1e-5) break; } CleanupResources(); shrQAFinishExit(argc, (const char **)argv, (i==N) ? QA_PASSED : QA_FAILED); } void CleanupResources(void) { // Free device memory if (d_A) cudaFree(d_A); if (d_B) cudaFree(d_B); if (d_C) cudaFree(d_C); // Free host memory if (h_A) free(h_A); if (h_B) free(h_B); if (h_C) free(h_C); cutilDeviceReset(); } // Allocates an array with random float entries. void RandomInit(float* data, int n) { for (int i = 0; i &lt; n; ++i) data[i] = rand() / (float)RAND_MAX; } // Parse program arguments void ParseArguments(int argc, char** argv) { for (int i = 0; i &lt; argc; ++i) { if (strcmp(argv[i], "--noprompt") == 0 || strcmp(argv[i], "-noprompt") == 0) { noprompt = true; break; } } } </code></pre> <p>can any one explain for me what does it mean?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload