StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POHow to reduce memory usage in a Haskell app?
text
Body
copied!<p>I am new to functional programming, and now learn Haskell. As an exercise I decided to implement the explicit Euler method for 1D linear diffusion equation. While the code below works correctly, I am not happy about its performance. In fact, I am concerned with memory consumption. I believe that it is related to lazy evaluation, but cannot figure out how I can reduce its memory usage.</p> <p>The idea of the algorithm is really simple, to make it clear in imperative terms: it takes an `array', and to every inner point it adds a value, which is calculated as a combination of the values in the point itself and in its neighbors. Boundary points are special cases.</p> <p>So, this is my Euler1D.hs module:</p> <pre><code>module Euler1D ( stepEuler , makeu0 ) where -- impose zero flux condition zeroflux :: (Floating a) => a -> [a] -> [a] zeroflux mu (boundary:inner:xs) = [boundary+mu*2*(inner-boundary)] -- one step of integration stepEuler :: (Floating a) => a -> [a] -> [a] stepEuler mu u@(x:xs) = (applyBC . (diffused mu)) u where diffused mu (left:x:[]) = [] -- ignore outer points diffused mu (left:x:right:xs) = -- integrate inner points (x+mu*(left+right-2*x)) : diffused mu (x:right:xs) applyBC inner = (lbc u') ++ inner ++ (rbc u') -- boundary conditions where u' = [head u] ++ inner ++ [last u] lbc = zeroflux mu -- left boundary rbc = (zeroflux mu) . reverse -- right boundary -- initial condition makeu0 :: Int -> [Double] makeu0 n = [ ((^2) . sin . (pi*) . xi) x | x <- [0..n]] where xi x = fromIntegral x / fromIntegral n </code></pre> <p>And a simple Main.hs:</p> <pre><code>module Main where import System ( getArgs ) import Euler1D main = do args <- getArgs let n = read $ head args :: Int let u0 = makeu0 n let un = stepEuler 0.5 u0 putStrLn $ show $ sum un </code></pre> <p>For comparison, I also wrote <a href="http://pastebin.com/f741f87c9" rel="noreferrer">a pure C implementation</a>.</p> <p>Now, if I try to run Haskell implementation for a sufficiently large size of the array <code>n</code>, I have:</p> <pre><code>$ time ./eulerhs 200000 100000.00000000112 real 0m3.552s user 0m3.304s sys 0m0.128s </code></pre> <p>For comparison, C version is faster by almost two orders of magnitude:</p> <pre><code>$ time ./eulerc 200000 100000 real 0m0.088s user 0m0.048s sys 0m0.008s </code></pre> <blockquote> <p><em>EDIT</em>: This comparison is not really fair, because Haskell version is compiled with profiling flags, and C is not. If I compile both programs with <code>-O2</code> and both without profiling flags, I can increase <code>n</code>. In this case <code>time ./eulerhs 1000000</code> takes 0m2.236s, while <code>time ./eulerc 1000000</code> takes only 0m0.293s. So the problem still remains with all optimizations and without profiling, it is only offset.</p> <p>I would like also to note, that memory allocation of the Haskell program seems to grow lineary with <code>n</code>. This is probably OK.</p> </blockquote> <p>But the worst are memory requirements. My Haskell version requires <strong>over 100MB</strong> (my estimation of the bare minimum in C is <strong>4MB</strong>). I think this may be the source of the problem. According to profiling report the program spends 85% of time in GC, and</p> <pre><code> total time = 0.36 secs (18 ticks @ 20 ms) total alloc = 116,835,180 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc makeu0 Euler1D 61.1 34.9 stepEuler Euler1D 33.3 59.6 CAF:sum Main 5.6 5.5 </code></pre> <p>I was surprized to see that <code>makeu0</code> is <em>so</em> expensive. I decided that this is due to its lazy evaluation (if its thunks remain in the memory until the end of <code>stepEuler</code>).</p> <p>I tried this change in <code>Main.hs</code>:</p> <pre><code> let un = u0 `seq` stepEuler 0.5 u0 </code></pre> <p>but did not notice any difference. I have no idea how to reduce memory usage in <code>stepEuler</code>. So, my questions are:</p> <ol> <li>Is there a way in Haskell to build lists / do list comprehensions strictly? In this case there is no benefit to keep it lazy.</li> <li>How can I reduce overall memory usage in this case? I suppose, I have to make something strict, but fail to see what. In other words, if I have to put some <code>seq</code>s and bangs, where and why?</li> <li>And finally, the most important, what is the best strategy to identify such costly constructs?</li> </ol> <p>I did read a chapter on profiling and optimization in <a href="http://book.realworldhaskell.org/read/profiling-and-optimization.html" rel="noreferrer">Real World Haskell</a>, but it remains unclear how exactly I can decide what should be strict and what not.</p> <p>Please forgive me such a long post.</p> <blockquote> <p><em>EDIT2</em>: As suggested by A. Rex in comments, I tried running both programs in valgrind. And this is what I observed. For Haskell program (<code>n</code>=200000) it found:</p> <blockquote> <p>malloc/free: 33 allocs, 30 frees, 84,109 bytes allocated. ... checked 55,712,980 bytes.</p> </blockquote> <p>And for C program (after a small fix):</p> <blockquote> <p>malloc/free: 2 allocs, 2 frees, 3,200,000 bytes allocated.</p> </blockquote> <p>So, it appears that while Haskell allocates much smaller memory blocks, it does it often, and due to delay in garbage collection, they accumulate and remain in memory. So, I have another question:</p> <ul> <li>Is it possible to avoid a lot of small allocations in Haskell? Basically, to declare, that I need to process the whole data structure rather than only its fragments on demand.</li> </ul> </blockquote>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload