Note that there are some explanatory texts on larger screens.

plurals
  1. POBuilding a histogram with haskell, many times slower than with python
    text
    copied!<p>I was going to test naive bayes classification. One part of it was going to be building a histogram of the training data. The problem is, I am using a large training data, the haskell-cafe mailing list since a couple of years back, and there are over 20k files in the folder.</p> <p>It takes a while over two minutes to create the histogram with python, and a little over 8 minutes with haskell. I'm using Data.Map (insertWith'), enumerators and text. What else can I do to speed up the program?</p> <p>Haskell:</p> <pre><code>import qualified Data.Text as T import qualified Data.Text.IO as TI import System.Directory import Control.Applicative import Control.Monad (filterM, foldM) import System.FilePath.Posix ((&lt;/&gt;)) import qualified Data.Map as M import Data.Map (Map) import Data.List (foldl') import Control.Exception.Base (bracket) import System.IO (Handle, openFile, hClose, hSetEncoding, IOMode(ReadMode), latin1) import qualified Data.Enumerator as E import Data.Enumerator (($$), (&gt;==&gt;), (&lt;==&lt;), (==&lt;&lt;), (&gt;&gt;==), ($=), (=$)) import qualified Data.Enumerator.List as EL import qualified Data.Enumerator.Text as ET withFile' :: (Handle -&gt; IO c) -&gt; FilePath -&gt; IO c withFile' f fp = do bracket (do h ← openFile fp ReadMode hSetEncoding h latin1 return h) hClose (f) buildClassHistogram c = do files ← filterM doesFileExist =&lt;&lt; map (c &lt;/&gt; ) &lt;$&gt; getDirectoryContents c foldM fileHistogram M.empty files fileHistogram m file = withFile' (λh → E.run_ $ enumHist h) file where enumHist h = ET.enumHandle h $$ EL.fold (λm' l → foldl' (λm'' w → M.insertWith' (const (+1)) w 1 m'') m' $ T.words l) m </code></pre> <p>Python:</p> <pre><code>for filename in listdir(root): filepath = root + "/" + filename # print(filepath) fp = open(filepath, "r", encoding="latin-1") for word in fp.read().split(): if word in histogram: histogram[word] = histogram[word]+1 else: histogram[word] = 1 </code></pre> <p><strong>Edit</strong>: Added imports</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload