Note that there are some explanatory texts on larger screens.

plurals
  1. POvector step addition slower on cuda
    text
    copied!<p>I am trying to run the vector step addition function on CUDA C++ code, but for large float arrays of size 5,000,000 too, it runs slower than my CPU version. Below is the relevant CUDA and cpu code that I am talking about:</p> <pre><code>#define THREADS_PER_BLOCK 1024 typedef float real; __global__ void vectorStepAddKernel2(real*x, real*y, real*z, real alpha, real beta, int size, int xstep, int ystep, int zstep) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i &lt; size) { x[i*xstep] = alpha* y[i*ystep] + beta*z[i*zstep]; } } cudaError_t vectorStepAdd2(real *x, real*y, real* z, real alpha, real beta, int size, int xstep, int ystep, int zstep) { cudaError_t cudaStatus; int threadsPerBlock = THREADS_PER_BLOCK; int blocksPerGrid = (size + threadsPerBlock -1)/threadsPerBlock; vectorStepAddKernel2&lt;&lt;&lt;blocksPerGrid, threadsPerBlock&gt;&gt;&gt;(x, y, z, alpha, beta, size, xstep, ystep, zstep); // cudaDeviceSynchronize waits for the kernel to finish, and returns // any errors encountered during the launch. cudaStatus = cudaDeviceSynchronize(); if (cudaStatus != cudaSuccess) { fprintf(stderr, "cudaDeviceSynchronize returned error code %d after launching vectorStepAddKernel!\n", cudaStatus); exit(1); } return cudaStatus; } //CPU function: void vectorStepAdd3(real *x, real*y, real* z, real alpha, real beta, int size, int xstep, int ystep, int zstep) { for(int i=0;i&lt;size;i++) { x[i*xstep] = alpha* y[i*ystep] + beta*z[i*zstep]; } } </code></pre> <p>Calling vectorStepAdd2 results in slower computation than vectorStepAdd3 when each of the 3 arrays are of size 5,000,000 and size=50,000 (i.e., 50,000 elements are added together in this step-wise manner).</p> <p>Any ideas on what I can do to speed up the GPU code? My device is a Tesla M2090 GPU</p> <p>Thanks</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload