StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Vendor-provided LAPACK / BLAS libraries (Intel's IPP/MKL have been mentioned, but there's also AMD's ACML, and other CPU vendors like IBM/Power or Oracle/SPARC provide equivalents as well) are often highly optimized for specific CPU abilities that'll significantly boost performance on <em>large</em> datasets.</p> <p>Often, though, you've got very <em>specific</em> small data to operate on (say, 4x4 matrices or 4D dot products, i.e. operations used in 3D geometry processing) and for those sort of things, BLAS/LAPACK are overkill, because of initial tests done by these subroutines which codepaths to choose, depending on properties of the data set. In those situations, simple C/C++ sourcecode, maybe using SSE2...4 intrinsics and/or compiler-generated vectorization, may beat BLAS/LAPACK.<br> That's why, for example, Intel has two libraries - MKL for <em>large</em> linear algebra datasets, and IPP for <em>small</em> (graphics vectors) data sets.</p> <p>In that sense, </p> <ul> <li>what exactly is your data set ? </li> <li>What matrix/vector sizes ? </li> <li>What linear algebra operations ? </li> </ul> <p>Also, regarding "simple for loops": Give the compiler the chance to vectorize for you. I.e. something like:</p> <pre><code>for (i = 0; i < DIM_OF_MY_VECTOR; i += 4) { vecmul[i] = src1[i] * src2[i]; vecmul[i+1] = src1[i+1] * src2[i+1]; vecmul[i+2] = src1[i+2] * src2[i+2]; vecmul[i+3] = src1[i+3] * src2[i+3]; } for (i = 0; i < DIM_OF_MY_VECTOR; i += 4) dotprod += vecmul[i] + vecmul[i+1] + vecmul[i+2] + vecmul[i+3]; </code></pre> <p>might be a better feed to a vectorizing compiler than the plain</p> <pre><code>for (i = 0; i < DIM_OF_MY_VECTOR; i++) dotprod += src1[i]*src2[i]; </code></pre> <p>expression. In some ways, what you mean by <em>calculations with for loops</em> will have a significant impact.<br> If your vector dimensions are large enough though, the BLAS version,</p> <pre><code>dotprod = CBLAS.ddot(DIM_OF_MY_VECTOR, src1, 1, src2, 1); </code></pre> <p>will be cleaner code and likely faster.</p> <p>On the reference side, these might be of interest:</p> <ul> <li><a href="http://software.intel.com/en-us/articles/intel-math-kernel-library-documentation/">Intel Math Kernel Libraries Documentation</a> (LAPACK / BLAS and others optimized for Intel CPUs)</li> <li><a href="http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-documentation/">Intel Performance Primitives Documentation</a> (optimized for small vectors / geometry processing)</li> <li><a href="http://developer.amd.com/cpu/Libraries/acml/Pages/default.aspx">AMD Core Math Libraries</a> (LAPACK / BLAS and others for AMD CPUs)</li> <li><a href="http://eigen.tuxfamily.org">Eigen Libraries</a> (a "nicer" linear algebra interface)</li> </ul>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload