Note that there are some explanatory texts on larger screens.

plurals
  1. POEfficiently print an Array to text in Haskell (bogus, sorry)
    primarykey
    data
    text
    <p><strong>Update 2</strong>: This question is bogus. I apologize. I added some code to print an array, skipping the rest of my program logic, and it is nearly instantaneous. What I though was slow array printing must have been slow {something} that was lazily computed after the array-print function was entered.</p> <p><strong>Update 1</strong>: The snippet below does not faithfully reproduce the performance problem in my original problem. I must have misattributed the source of the slowness, either due to mis-tracking lazuy evaulation, or some other change that I made when extricating the small snippet from the larger program. I will try to create a self-contained reproduction of the slow processing.</p> <p><strong>Summary:</strong> Formatting an Unboxed Array of Int32 to String, and printing the resulting ~4MB of text to disk, is inefficient, about 1 minute to print 1 million values (60K CPU cycles per value). Why? Can it be made faster?</p> <p><strong>Details:</strong></p> <p>Running as a compiled program with "-O2" compiler flag. GHC 7.0.4, Haskell Platform 2011.04, Windows 7 x64.</p> <p>I have a Haskell program that builds an array of size 100K to 1 Million <code>Int32</code> values.</p> <p>The <code>Int32</code> values are actually <code>Int8</code> 4-tuple values <code>(Int8, Int8, Int8, Int8)</code> packed into an <code>Int32</code> in a (perhaps misguided) attempt at efficiency.</p> <p>I am using <a href="http://cvs.haskell.org/Hugs/pages/libraries/base/Data-Array-ST.html" rel="nofollow">STUArray</a> (but that might be a poor choice) to store and mutate the array.</p> <p>At certain points in my program, I print the array to disk, in this form:</p> <pre><code>byte byte byte ... </code></pre> <p>where each byte is an ASCII representation of a decimate value of a byte ("0" to "255"). Example:</p> <pre><code>0 255 128 5 31 ... </code></pre> <p>My problem is that the IO action of printing the array to a <code>Handle</code> is slow: <strong>It takes about a minute to format print the ~1Million bytes to text on the file handle</strong>, on a modern 3.3GHz i5 CPU and SSD disk.</p> <p>My printing code is basically this:</p> <blockquote> <p>import Data.Array.Unboxed<br> import System.IO<br> import Contol.Monad<br> import Data.Bits</p> <p>printArray handle frozenArray = <a href="http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/Control-Monad.html#v%3amapM_" rel="nofollow">mapM</a>_ (<a href="http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html#g:19" rel="nofollow">hPutStr</a> handle . showPacked . (frozenArray !)) [0..arrayLength]</p> </blockquote> <pre><code>showPacked x = (' ':) . (shows $ (shift x (-24)) .&amp;. 255) . (' ':) . (shows $ (shift x (-16)) .&amp;. 255) . (' ':) . (shows $ (shift x (-8)) .&amp;. 255) . (' ':) . (shows $ x .&amp;. 255) $ "" </code></pre> <p><strong>Is there a better way?</strong></p> <p>What's likely to be the source of the problem? I have a few guesses (in decreasing order of likelihood):</p> <ul> <li>Formatting each Int32 with my code is inefficient.</li> <li>mapM_ over the array with (!) inefficient.</li> <li>Calling hPutStr a million times is inefficient.</li> <li>1 minute to print 1 million values actually is efficient.</li> </ul> <p>Potential red herring warning: There is some risk that the delay I see is related to lazy computation that is forced only when I print, and not caused by formating/printing itself, but I don't think so, because even when I print the array in the first iteration (after creating an initial blank array), it's still slow. And I can watch the output file grow slowly, so a lot of work is happening after the first byte is printed, even using a strict unboxed array.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload