StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>thank you both talonmies and ahmad they both helped to get the right answer which worked for me , and the complete answer (for who is interesting ) was the following : </p> <pre><code>// MP 1 #include <wb.h> __global__ void vecAdd(float* in1, float* in2, float* out, int len) { int i = threadIdx.x + blockDim.x * blockIdx.x; if (i < len ) out[i] = in1[i] + in2[i]; } int main(int argc, char ** argv) { wbArg_t args; int inputLength; float * hostInput1; float * hostInput2; float * hostOutput; float * deviceInput1; float * deviceInput2; float * deviceOutput; args = wbArg_read(argc, argv); wbTime_start(Generic, "Importing data and creating memory on host"); hostInput1 = (float *) wbImport(wbArg_getInputFile(args, 0), &inputLength); hostInput2 = (float *) wbImport(wbArg_getInputFile(args, 1), &inputLength); hostOutput = (float *) malloc(inputLength * sizeof(float)); int size = inputLength*sizeof(float); wbTime_stop(Generic, "Importing data and creating memory on host"); wbLog(TRACE, "The input length is ", inputLength); wbTime_start(GPU, "Allocating GPU memory."); cudaMalloc((void**)&deviceInput1 , size); cudaMalloc((void**)&deviceInput2 , size); cudaMalloc((void**)&deviceOutput , size); wbTime_stop(GPU, "Allocating GPU memory."); wbTime_start(GPU, "Copying input memory to the GPU."); cudaMemcpy(deviceInput1, hostInput1, size, cudaMemcpyHostToDevice); cudaMemcpy(deviceInput2, hostInput2, size, cudaMemcpyHostToDevice); wbTime_stop(GPU, "Copying input memory to the GPU."); dim3 DimGrid((inputLength -1)/256 +1 , 1 , 1); dim3 DimBlock(256 , 1, 1); wbTime_start(Compute, "Performing CUDA computation"); //@@ Launch the GPU Kernel vecAdd<<<DimGrid , DimBlock>>>(deviceInput1 , deviceInput2 , deviceOutput , inputLength); cudaThreadSynchronize(); wbTime_stop(Compute, "Performing CUDA computation"); wbTime_start(Copy, "Copying output memory to the CPU"); //@@ Copy the GPU memory back to the CPU cudaMemcpy(hostOutput, deviceOutput, size , cudaMemcpyDeviceToHost); wbTime_stop(Copy, "Copying output memory to the CPU"); wbTime_start(GPU, "Freeing GPU Memory"); //@@ Free the GPU memory cudaFree(deviceInput1); cudaFree(deviceInput2); cudaFree(deviceOutput); wbTime_stop(GPU, "Freeing GPU Memory"); wbSolution(args, hostOutput, inputLength); free(hostInput1); free(hostInput2); free(hostOutput); return 0; } </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload