Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<ul> <li>This function is perfectly amenable to SIMD processing. Look into your compiler documentation for the intrinsic functions that correspond to the SSE instructions.</li> <li>You could break up the dependence chain on the <code>sum</code> variable. Instead of a single <code>sum</code> accumulator, use two accumulators <code>sum1</code> and <code>sum2</code> alternately - one for even, one for odd indices. Add them up afterwards.</li> <li>The single biggest performance bottleneck here is the <code>log()</code> function. Check if an approximation would be sufficient. The calculation of this could also be vectorized - I believe Intel published a high-performance math library - including vectorized versions of functions like <code>log()</code>. You may like to use this.</li> <li>You are operating on <code>float</code>s here, and <code>log()</code> uses <code>double</code> precision. Use <code>logf()</code> instead. It may (or may not) be faster. It will certainly be no slower.</li> <li>If your compiler understands C99, place a <code>restrict</code> qualifier on the pointers which are function arguments. This tells the compiler that those arrays do not overlap, and may help it generate more efficient code.</li> <li>Change the way matrices are kept in memory. Instead of an array of pointers pointing to disjoint memory blocks, use a single array M*N elements in size.</li> </ul> <p>So, to put it together, this is how the function should look like. This is portable C99. Using the compiler-specific SIMD intrinsics, this could be made WAAAAY faster.</p> <p><strong>UPDATE:</strong> Note that I changed the way input matrices are defined. A matrix is a single, large array.</p> <pre><code>float fnFrequentFunction(const float *restrict x, const float *restrict y, const float *restrict z, const float *restrict a, const float *restrict b, float *restrict c, int n) { float ret = 0; const float *restrict yy = y; //for readability const float *restrict zz = z; // -||- for (int i = 0; i &lt; n; i++, yy += M, zz += M) // n == 1, 2, 4, or 8 { float sum = 0; float sum2 = 0; for(int j = 0; j &lt; 10; j += 2) { float tmp = x[j] - yy[j]; sum += tmp * tmp * zz[j]; float tmp2 = x[j+1] - yy[j+1]; sum2 += tmp2 * tmp2 * zz[j+1]; } sum += sum2; ret += (c[i] = logf(a[i] * b[i]) + sum); } return ret; } </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload