Note that there are some explanatory texts on larger screens.

plurals
  1. POcopy from GPU to CPU is slower than copying CPU to GPU
    primarykey
    data
    text
    <p>I have started learning cuda for a while and I have the following problem</p> <p>See how I am doing below: </p> <p><strong>Copy GPU</strong></p> <pre><code>int* B; // ... int *dev_B; //initialize B=0 cudaMalloc((void**)&amp;dev_B, Nel*Nface*sizeof(int)); cudaMemcpy(dev_B, B, Nel*Nface*sizeof(int),cudaMemcpyHostToDevice); //... //Execute on GPU the following function which is supposed to fill in //the dev_B matrix with integers findNeiborElem &lt;&lt;&lt; Nblocks, Nthreads &gt;&gt;&gt;(dev_B, dev_MSH, dev_Nel, dev_Npel, dev_Nface, dev_FC); </code></pre> <p><strong>Copy CPU again</strong></p> <pre><code>cudaMemcpy(B, dev_B, Nel*Nface*sizeof(int),cudaMemcpyDeviceToHost); </code></pre> <ol> <li>Copying array B to dev_B takes only a fraction of a second. However copying array dev_B back to B takes forever.</li> <li><p>The findNeiborElem function involves a loop for each thread e.g. it looks like that </p> <pre><code>__ global __ void findNeiborElem(int *dev_B, int *dev_MSH, int *dev_Nel, int *dev_Npel, int *dev_Nface, int *dev_FC){ int tid=threadIdx.x + blockIdx.x * blockDim.x; while (tid&lt;dev_Nel[0]){ for (int j=1;j&lt;=Nel;j++){ // do some calculations B[ind(tid,1,Nel)]=j// j in most cases do no go all the way to the Nel reach break; } tid += blockDim.x * gridDim.x; } } </code></pre></li> </ol> <p>What's very wierd about it, is that the time to copy dev_B to B is proportional to the number of iterations of j index. </p> <p>For example if <code>Nel=5</code> then the time is approx <code>5 sec</code>. </p> <p>When I increase the <code>Nel=20</code> the time is about <code>20 sec</code>. </p> <p>I would expect that the copy time should be independent of the inner iterations one need to assign the value of the Matrix <code>dev_B</code>. </p> <p>Also I would expect that the time to copy the same matrix from and to CPU would be of the same order.</p> <p>Do you have any idea what is wrong?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload