Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Gaussian elimination can be seen as a two steps procedure. The first step aims at transforming the linear system to an upper triangular linear system and the second consists of solving the so obtained upper triangular linear system. The second step is trivial in CUDA and can be efficiently performed by <code>cublasStrsm</code>. The first step, which you are addressing in your post, is the tricky part.</p> <p>There are several optimized approaches to solve the first step. I think you approach is somewhat naive and I recommend studying the literature to achieve decent speedups.</p> <p>Basically, performing the transformation of the original system to an upper triangular one can be performed by a <em>tiling</em> approach which, from some points of view, resembles the tiling approach which is used to perform the matrix-matrix multiplication in the classical example of the CUDA C Programming Guide.</p> <p>The tiling approach can be performed either by purposely written kernels or by making massive use of cuBLAS routines.</p> <p>Last month (November 2013), the following paper</p> <p>Manuel Carcenac, "From tile algorithm to stripe algorithm: a CUBLAS-based parallel implementation on GPUs of Gauss method for the resolution of extremely large dense linear systems stored on an array of solid state devices", <em>Journal of Supercomputing</em>, DOI 10.1007/s11227-013-1043-3</p> <p>has proposed a tiling/stripping approach based on the use of cuBLAS.</p> <p>All the above mentioned approaches are summarized in a presentation available at M. Carcenac's webpage entitled <a href="http://eng.eul.edu.tr/manuel/Course_on_Advanced_GPU_computing/6--Application_linear_system_resolution_with_Gauss_method.pdf" rel="nofollow">Application: linear system resolution with Gauss method</a>.</p> <p>Furthermore, a downloadable Visual Studio 2010 project implementing all of them with some performance testing is available at the <a href="http://www.orangeowlsolutions.com/archives/721" rel="nofollow">Gaussian elimination with CUDA</a> post. From the available code, you can make your own tests for your architecture of interest and experience the improvements the approach by M. Carcenac is introducing with respect to the others.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload