Note that there are some explanatory texts on larger screens.

plurals
  1. POMutable, (possibly parallel) Haskell code and performance tuning
    text
    copied!<p>I have now <a href="https://stackoverflow.com/questions/7893786/how-to-make-my-haskell-program-faster-comparison-with-c">implemented another</a> SHA3 candidate, namely Grøstl. This is still work in progress (very much so), but at the moment a 224-bit version pass all KATs. So now I'm wondering about performance (again :->). The difference this time, is that I chose to more closely mirror the <a href="http://www.groestl.info/implementations.html" rel="nofollow noreferrer">(optimized) C implementation</a>, i.e. I made a port from C to Haskell. The optimized C version use table-lookups to implement the algorithm. Furthermore the code is heavily based on updating an array containing 64-bit words. Thus I chose to use mutable unboxed vectors in Haskell. </p> <p>My Grøstl code can be found here: <a href="https://github.com/hakoja/SHA3/blob/master/Data/Digest/GroestlMutable.hs" rel="nofollow noreferrer">https://github.com/hakoja/SHA3/blob/master/Data/Digest/GroestlMutable.hs</a></p> <p>Short description of the algorithm: It's a Merkle-Damgård construction, iterating a compression function (<strong>f512M</strong> in my code) as long as there are 512-bits blocks of message left. The compression function is very simple: it simply runs two different independent 512-bit permutations <strong>P</strong> and <strong>Q</strong> (<strong>permP</strong> and <strong>permQ</strong> in my code) and combines their output. Its these permutations which are implemented by lookup tables. </p> <p><strong>Q1)</strong> The first thing that bothers me is that the use of mutable vectors makes my code look really fugly. This is my first time writing any major mutable code in Haskell so I don't really know how to improve this. Any tips on how I might better strucure the monadic code would be welcome.</p> <p><strong>Q2)</strong> The second is performance. Actually It's not too bad, because at the moment the Haskell code is only 3 times slower. Using GHC-7.2.1 and compiling as such:</p> <blockquote> <p>ghc -O2 -Odph -fllvm -optlo-O3 -optlo-loop-reduce -optlo-loop-deletion</p> </blockquote> <p>the Haskell code uses 60s. on an input of ~1GB, while the C-version uses 21-22s. But there are some things I find odd: </p> <p>(1) If I try to inline <strong><em>rnd512QM</em></strong>, the code takes 4 times longer, but if I inline <strong><em>rnd512PM</em></strong> nothing happens! Why is this happening? These two functions are virtually identical!</p> <p>(2) This is maybe more difficult. I've been experimenting with executing the two permutations in parallel. But currently to no avail. This is one example of what I tried:</p> <pre><code>f512 h m = V.force outP `par` (V.force outQ `pseq` (V.zipWith3 xor3 h outP outQ)) where xor3 x1 x2 x3 = x1 `xor` x2 `xor` x3 inP = V.zipWith xor h m outP = permP inP outQ = permQ m </code></pre> <p>When checking the run-time statistics, and using ThreadScope, I noticed that the correct number of SPARKS was created, but almost none was actually converted to useful parallel work. Thus I gained nothing in speedup. My question then becomes: </p> <ol> <li>Are the P and Q functions just too small for the runtime to bother to run in parallel?</li> <li>If not, is my use of <strong><em>par</em></strong> and <strong><em>pseq</em></strong> (and possibly Vector.Unboxed.force) wrong?</li> <li>Would I gain anything by switching to strategies? And how would I go about doing that? </li> </ol> <p>Thank you so much for your time.</p> <p><strong>EDIT:</strong></p> <p>Sorry for not providing any real benchmark tests. The testing code in the repo was just intended for myself only. For those wanting to test the code out, you will need to compile <a href="https://github.com/hakoja/SHA3" rel="nofollow noreferrer">main.hs</a>, and then run it as: </p> <blockquote> <p>./main "algorithm" "testvariant" "byte aligned"</p> </blockquote> <p>For instance:</p> <blockquote> <p>./main groestl short224 False</p> </blockquote> <p>or</p> <blockquote> <p>./main groestl e False</p> </blockquote> <p>(<strong>e</strong> stands for "Extreme". It's the very long message provided with the NIST KATS). </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload