Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy does this parallel code in D scale so badly?
    primarykey
    data
    text
    <p>Here is one experiment I performed comparing parallelism in C++ and D. I implemented an algorithm (a parallel label propagation scheme for community detection in networks) in both languages, using the same design: A parallel iterator gets a handle function (normally a closure) and applies it for every node in the graph. </p> <p>Here is the iterator in D, implemented using <code>taskPool</code> from <code>std.parallelism</code>:</p> <pre><code>/** * Iterate in parallel over all nodes of the graph and call handler (lambda closure). */ void parallelForNodes(F)(F handle) { foreach (node v; taskPool.parallel(std.range.iota(z))) { // call here handle(v); } } </code></pre> <p>And this is the handle function which is passed:</p> <pre><code> auto propagateLabels = (node v){ if (active[v] &amp;&amp; (G.degree(v) &gt; 0)) { integer[label] labelCounts; G.forNeighborsOf(v, (node w) { label lw = labels[w]; labelCounts[lw] += 1; // add weight of edge {v, w} }); // get dominant label label dominant; integer lcmax = 0; foreach (label l, integer lc; labelCounts) { if (lc &gt; lcmax) { dominant = l; lcmax = lc; } } if (labels[v] != dominant) { // UPDATE labels[v] = dominant; nUpdated += 1; // TODO: atomic update? G.forNeighborsOf(v, (node u) { active[u] = 1; }); } else { active[v] = 0; } } }; </code></pre> <p>The C++11 implementation is almost identical, but uses OpenMP for parallelization. So what do the scaling experiments show? </p> <p><img src="https://i.stack.imgur.com/2LloS.png" alt="scaling"></p> <p>Here I examine weak scaling, doubling the input graph size while also doubling the number of threads and measuring the running time. The ideal would be a straight line, but of course there is some overhead for parallelism. I use <code>defaultPoolThreads(nThreads)</code> in my main function to set the number of threads for the D program. The curve for C++ looks good, but the curve for D looks surprisingly bad. Am I doing something wrong w.r.t. D parallelism, or does this reflect badly on the scalability of parallel D programs?</p> <p>p.s. compiler flags</p> <p>for D: <code>rdmd -release -O -inline -noboundscheck</code></p> <p>for C++: <code>-std=c++11 -fopenmp -O3 -DNDEBUG</code></p> <p>pps. Something must be really wrong, because the D implementation is slower in parallel than sequentially:</p> <p><img src="https://i.stack.imgur.com/rtUih.png" alt="enter image description here"></p> <p>ppps. For the curious, here are the Mercurial clone urls for both implementations:</p> <ul> <li><a href="http://algohub.iti.kit.edu/parco/Prototypes/PLPd" rel="nofollow noreferrer">http://algohub.iti.kit.edu/parco/Prototypes/PLPd</a></li> <li><a href="http://algohub.iti.kit.edu/parco/Prototypes/PLPcpp" rel="nofollow noreferrer">http://algohub.iti.kit.edu/parco/Prototypes/PLPcpp</a></li> </ul>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload