Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<ol> <li><p>Inline the calls to lengthsq().</p></li> <li><p>Change pow(r2,-1.5) to 1/(r2*sqrt(r2)) to lower the cost of the computing r^1.5</p></li> <li><p>Use scalars (p_i_acc, etc.) inside the innner most loop rather than p[i].acc to collect your result. The compiler may not know that p[i] isn't aliased with p[j], and that might force addressing of p[i] on each loop iteration unnecessarily.</p></li> </ol> <p>4a. Try replacing the if (...) tau_q = with </p> <pre><code> tau_q=minimum(...,...) </code></pre> <p>Many compilers recognize the mininum function as one they can do with predicated operations rather than real branches, avoiding pipeline flushes.</p> <p>4b. [EDIT to split 4a and 4b apart] You might consider storing tau_..q2 instead as tau_q, and comparing against r2/v2 rather than r2*r2/v2*v2. Then you avoid doing two multiplies for each iteration in the inner loop, in trade for a single squaring operation to compute tau..q2 at the end. To do this, collect minimums of tau_q1 and tau_q2 (not squared) separately, and take the minimum of those results in a single scalar operation on completion of the loop]</p> <ol start="5"> <li>[EDIT: I suggested the following, but in fact it isn't valid for the OP's code, because of the way he updates in the loop.] Fold the two loops together. With the two loops and large enough set of particles, you thrash the cache and force a refetch from non-cache of those initial values in the second loop. The fold is trivial to do.</li> </ol> <p>Beyond this you need to consider a) loop unrolling, b) vectorizing (using SIMD instructions; either hand coding assembler or using the Intel compiler, which is supposed to be pretty good at this [but I have no experience with it], and c) going multicore (using OpenMP).</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload