Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p><strong>Code Review Notes</strong></p> <ul> <li>47.8% of your time is spent in GC.</li> <li>1.5G is allocated on the heap (!)</li> <li>The repa code looks a <em>lot</em> more complicated than the list code.</li> <li>Lots of parallel GC is occuring</li> <li>I can get up to 300% efficiency on a -N4 machine</li> <li>Putting in more type signatures will make it easier to analyze...</li> <li><code>rsize</code> isn't used (looks expensive!)</li> <li>You convert repa arrays to vectors, why? </li> <li>All your uses of <code>(**)</code> could be replaced by the cheaper <code>(^)</code> on <code>Int</code>.</li> <li>There's a suspicious number of large, constant lists. Those all have to be converted to arrays -- that seems expensive.</li> <li><code>any (==True)</code> is the same as <code>or</code></li> </ul> <p><strong>Time profiling</strong></p> <pre><code>COST CENTRE MODULE %time %alloc squared_diff Main 25.0 27.3 insideParticle Main 13.8 15.3 sum_squared_diff Main 9.8 5.6 rcoords Main 7.4 5.6 particle_extended Main 6.8 9.0 particle_slice Main 5.0 7.6 insideParticles Main 5.0 4.4 yslice Main 3.6 3.0 xslice Main 3.0 3.0 ssd_vec Main 2.8 2.1 **^ Main 2.6 1.4 </code></pre> <p>shows that, your function <code>squared_diff</code> is a bit suspicious:</p> <pre><code>squared_diff :: Array DIM2 Double squared_diff = deepSeqArrays [rcoords,particle_extended] ((force2 rcoords) -^ (force2 particle_extended)) **^ 2 </code></pre> <p>though I don't see any obvious fix.</p> <p><strong>Space profiling</strong></p> <p>Nothing too amazing in the space profile: you clearly see the list phase, then the vector phase. The list phase allocates a lot, which gets reclaimed.</p> <p><img src="https://i.stack.imgur.com/cYVl8.png" alt="enter image description here"></p> <p>Breaking down the heap by type, we see initially a lot of lists and tuples being allocated (on demand), then a big chunk of arrays are allocated and held:</p> <p><img src="https://i.stack.imgur.com/iLCKK.png" alt="enter image description here"></p> <p>Again, kinda what we expected to see... the array stuff isn't allocating especially more than the list code (in fact, a bit less overall), but it is just taking a lot longer to run.</p> <p>Checking for space leaks with <em>retainer profiling</em>:</p> <p><img src="https://i.stack.imgur.com/h8jeB.png" alt="enter image description here"></p> <p>There's a few interesting things there, but nothing startling. <code>zcoords</code> gets retained for the length of the list program execution, then some arrays (SYSTEM) are being allocated for the repa run. </p> <p><strong>Inspecting the Core</strong></p> <p>So at this point I'm firstly assuming that you really did implement the same algorithms in lists and arrays (i.e. no extra work is being done in the array case), and there's no obvious space leak. So my suspicion is badly-optimized repa code. Let's look at the core (with <a href="http://hackage.haskell.org/package/ghc-core" rel="noreferrer">ghc-core</a>.</p> <ul> <li>The list-based code looks fine.</li> <li>The array code looks reasonable (i.e. unboxed primitives appear), but very complex, and a lot of it.</li> </ul> <p><em>Inlining all the CAFs</em></p> <p>I added inline pragmas to all the top level array definitions, in a hope to remove some of the CAfs, and get GHC to optimize the array code a bit harder. This really made GHC struggle to compile the module (allocating up to 4.3G and 10 minutes while working on it). This is a clue to me that GHC wasn't able to optimize this program well before, since there's new stuff for it to do when I increase the thresholds.</p> <p><strong>Actions</strong></p> <ul> <li>Using -H can decrease the time spent in GC.</li> <li>Try to eliminate the conversions from lists to repas to vectors.</li> <li>All those CAFs (top level constant data structures) are kinda weird -- a real program wouldn't be a list of top level constants -- in fact, this module is pathologically so, causing lots of values to be retained over long periods, instead of being optimized away. Float local definitions inwards.</li> <li>Ask for help from <a href="http://www.cse.unsw.edu.au/~benl/" rel="noreferrer">Ben Lippmeier</a>, the author of Repa, particularly since there's some funky optimization stuff happening.</li> </ul>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload