Note that there are some explanatory texts on larger screens.

plurals
  1. POGaussian elimination (with no pivoting) in CUDA
    primarykey
    data
    text
    <p>I am trying to solve Gaussian elimination with CUDA. </p> <p>I have a <code>N*N</code> matrix. To get new elements of this matrix, I use the CPU code below, where <code>C.width=N</code>:</p> <pre><code>for(int z=0; z&lt; C.width-1; z++) { for ( int c = z+1 ; c &lt; C.width ; c++ ) { for (int d = z ; d &lt; C.width ; d++ ) { C.elements[c*C.width+d]=C.elements[c*C.width+d] - (B.elements[c*C.width+z]*C.elements[z*C.width+d]); } } } </code></pre> <p>I am trying to implement it with CUDA. For example, for <code>N=512</code></p> <pre><code>dim3 dimBlock(16,16,1); dim3 dimGrid(32,32,1); MatMulKernel&lt;&lt;&lt;dimGrid, dimBlock&gt;&gt;&gt;(d_A, d_B, d_C); </code></pre> <p>I think for every iteration I should use <code>N-i*N</code> threads to calculate the elements update, that is</p> <pre><code> if(idx&gt;511 || idy&gt;510) return; for(int i=1; i&lt;512;i++) { if(idx&gt;=i-1 &amp;&amp; idy&gt;=i-1) C.elements[(idy+1)*C.width+idx]=C.elements[(idy+1)*C.width+idx]-((C.elements[(idy+1)*C.width+(i-1)]/C.elements[(i-1)*C.width+(i-1)])*C.elements[(i-1)*C.width+idx]); __syncthreads(); } } </code></pre> <p>The results obtained on GPU and CPU are the same, but the processing time is <code>Time(CPU)=2*Time(GPU)</code></p> <p>For <code>N=512</code>: <code>Time(CPU) = 1900 ms</code>; <code>Time(GPU) = 980 ms</code></p> <p>For <code>N=1024</code>: <code>Time(CPU) = 14000 ms</code>; Time(GPU) = 7766 ms` . . .</p> <p>I think the speed-up should be larger than what I have now. Is there any mistake in my parallel code? Can you help me how can I rewrite my code?</p> <p>Thanks for any help!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload