StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POEfficient parallel strategies
text
Body
copied!<p>I'm trying to wrap my head around parallel strategies. I think I understand what each of the combinators do, but every time I try using them with more than 1 core, the program slows considerably.</p> <p>For example a while back I tried to calculate histograms (and from them unique words) from ~700 documents. I thought that using file level granularity would be ok. With <code>-N4</code> I get 1.70 work balance. However with <code>-N1</code> it runs in half the time than it does with <code>-N4</code>. I'm not sure what the question is really, but I'd like to know how to decide where/when/how to parallelize and gain some understanding on it. How would this be parallelized so that the speed increases with cores instead of decreasing?</p> <pre><code>import Data.Map (Map) import qualified Data.Map as M import System.Directory import Control.Applicative import Data.Vector (Vector) import qualified Data.Vector as V import qualified Data.Text as T import qualified Data.Text.IO as TI import Data.Text (Text) import System.FilePath ((</>)) import Control.Parallel.Strategies import qualified Data.Set as S import Data.Set (Set) import GHC.Conc (pseq, numCapabilities) import Data.List (foldl') mapReduce stratm m stratr r xs = let mapped = parMap stratm m xs reduced = r mapped `using` stratr in mapped `pseq` reduced type Histogram = Map Text Int rootDir = "/home/masse/Documents/text_conversion/" finnishStop = ["minä", "sinä", "hän", "kuitenkin", "jälkeen", "mukaanlukien", "koska", "mutta", "jos", "kuitenkin", "kun", "kunnes", "sanoo", "sanoi", "sanoa", "miksi", "vielä", "sinun"] englishStop = ["a","able","about","across","after","all","almost","also","am","among","an","and","any","are","as","at","be","because","been","but","by","can","cannot","could","dear","did","do","does","either","else","ever","every","for","from","get","got","had","has","have","he","her","hers","him","his","how","however","i","if","in","into","is","it","its","just","least","let","like","likely","may","me","might","most","must","my","neither","no","nor","not","of","off","often","on","only","or","other","our","own","rather","said","say","says","she","should","since","so","some","than","that","the","their","them","then","there","these","they","this","tis","to","too","twas","us","wants","was","we","were","what","when","where","which","while","who","whom","why","will","with","would","yet","you","your"] isStopWord :: Text -> Bool isStopWord x = x `elem` (finnishStop ++ englishStop) textFiles :: IO [FilePath] textFiles = map (rootDir </>) . filter (not . meta) <$> getDirectoryContents rootDir where meta "." = True meta ".." = True meta _ = False histogram :: Text -> Histogram histogram = foldr (\k -> M.insertWith' (+) k 1) M.empty . filter (not . isStopWord) . T.words wordList = do files <- mapM TI.readFile =<< textFiles return $ mapReduce rseq histogram rseq reduce files where reduce = M.unions main = do list <- wordList print $ M.size list </code></pre> <p>As for the text files, I'm using pdfs converted to text files so I can't provide them, but for the purpose, almost any book/books from project gutenberg should do.</p> <p><strong>Edit</strong>: Added imports to script</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload