Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>It was pretty straightforward to use <em>Rcpp</em> and <em>MAGMA</em> after having tinkered around with <em>culatools</em>. Here is the .cpp file:</p> <pre><code>#include&lt;Rcpp.h&gt; #include&lt;magma.h&gt; using namespace Rcpp; RcppExport SEXP gpuQR_magma(SEXP X_) { // Input NumericMatrix X(X_); // Initialize magma and cublas magma_init(); cublasInit(); // Declare variables int info, lwork, n_rows = X.nrow(), n_cols = X.ncol(), min_mn = min(n_rows, n_cols); double tmp[1]; NumericVector scale(min_mn); // Query workspace size magma_dgeqrf(n_rows, n_cols, &amp;(X[0]), n_rows, &amp;(scale[0]), &amp;(work[0]), -1, &amp;info); lwork = work[0]; NumericVector work(lwork); // Run QR decomposition magma_dgeqrf(n_rows, n_cols, &amp;(X[0]), n_rows, &amp;(scale[0]), &amp;(work[0]), lwork, &amp;info); // Scale factor result for(int ii = 1; ii &lt; n_rows; ii++) { for(int jj = 0; jj &lt; n_cols; jj++) { if(ii &gt; jj) { X[ii + jj * n_rows] *= scale[jj]; } } } // Shutdown magma and cublas magma_finalize(); cublasShutdown(); // Output return wrap(X); } </code></pre> <p>The file can be compiled from R into a shared library using:</p> <pre><code>library(Rcpp) PKG_LIBS &lt;- sprintf('-Wl,-rpath,/usr/local/magma/lib -L/usr/local/magma/lib -lmagma /usr/local/magma/lib/libmagma.a -Wl,-rpath,/usr/local/cuda-5.5/lib64 %s', Rcpp:::RcppLdFlags()) PKG_CPPFLAGS &lt;- sprintf('-DADD_ -DHAVE_CUBLAS -I/usr/local/magma/include -I/usr/local/cuda-5.5/include %s', Rcpp:::RcppCxxFlags()) Sys.setenv(PKG_LIBS = PKG_LIBS , PKG_CPPFLAGS = PKG_CPPFLAGS) R &lt;- file.path(R.home(component = 'bin'), 'R') file &lt;- '/path/gpuQR_magma.cpp' cmd &lt;- sprintf('%s CMD SHLIB %s', R, paste(file, collapse = ' ')) system(cmd) </code></pre> <p>The shared library can now be called in <em>R</em>. Comparing the results with with <em>R</em>'s <em>qr()</em> gives:</p> <pre><code>dyn.load('/path/gpuQR_magma.so') set.seed(100) n_row &lt;- 3; n_col &lt;- 3 A &lt;- matrix(rnorm(n_row * n_col), n_row, n_col) qr(A)$qr [,1] [,2] [,3] [1,] 0.5250957 -0.8666925 0.8594266 [2,] -0.2504899 -0.3878643 -0.1277838 [3,] 0.1502909 0.4742033 -0.8804247 .Call('gpuQR_magma', A) [,1] [,2] [,3] [1,] 0.5250957 -0.8666925 0.8594266 [2,] -0.2504899 -0.3878643 -0.1277838 [3,] 0.1502909 0.4742033 -0.8804247 </code></pre> <p>Below are the results from a benchmark using a NVIDIA GeForce GTX 675MX GPU with 960 CUDA cores and OpenBLAS:</p> <pre><code>n_row &lt;- 3000; n_col &lt;- 3000 A &lt;- matrix(rnorm(n_row * n_col), n_row, n_col) B &lt;- A; dim(B) &lt;- NULL res &lt;- benchmark(.Call('gpuQR_magma', A), .Call('gpuQR_cula', B, n_row, n_col), qr(A), columns = c('test', 'replications', 'elapsed', 'relative'), order = 'relative') test replications elapsed relative 2 .Call("gpuQR_cula", B, n_row, n_col) 100 18.704 1.000 1 .Call("gpuQR_magma", A) 100 70.461 3.767 3 qr(A) 100 985.411 52.685 </code></pre> <p>Seems like as if <em>MAGMA</em> is a bit slower compared to <em>culatools</em> (in this example). However, <em>MAGMA</em> comes with out-of-memory implementations and that is something I value a lot given only 1Gb of GPU memory.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload