Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>These differences can be attributed to 1) communication overhead (especially if you run across nodes) and 2) performance overhead (if your job is not that intensive compared to initiating a parallelisation, for example). Usually, if the task you are parallelising is not that time-consuming, then you will mostly find that parallelisation does NOT have much of an effect (which is much highly visible on huge datasets.</p> <p>Even though this may not directly answer your benchmarking, I hope this should be rather straightforward and can be related to. As an example, here, I construct a <code>data.frame</code> with <code>1e6</code> rows with <code>1e4</code> unique column <code>group</code> entries and some values in column <code>val</code>. And then I run using <code>plyr</code> in <code>parallel</code> using <code>doMC</code> and without parallelisation.</p> <pre><code>df &lt;- data.frame(group = as.factor(sample(1:1e4, 1e6, replace = T)), val = sample(1:10, 1e6, replace = T)) &gt; head(df) group val # 1 8498 8 # 2 5253 6 # 3 1495 1 # 4 7362 9 # 5 2344 6 # 6 5602 9 &gt; dim(df) # [1] 1000000 2 require(plyr) require(doMC) registerDoMC(20) # 20 processors # parallelisation using doMC + plyr P.PLYR &lt;- function() { o1 &lt;- ddply(df, .(group), function(x) sum(x$val), .parallel = TRUE) } # no parallelisation PLYR &lt;- function() { o2 &lt;- ddply(df, .(group), function(x) sum(x$val), .parallel = FALSE) } require(rbenchmark) benchmark(P.PLYR(), PLYR(), replications = 2, order = "elapsed") test replications elapsed relative user.self sys.self user.child sys.child 2 PLYR() 2 8.925 1.000 8.865 0.068 0.000 0.000 1 P.PLYR() 2 30.637 3.433 15.841 13.945 8.944 38.858 </code></pre> <p>As you can see, the <strong>parallel</strong> version of <code>plyr</code> runs <strong>3.5 times slower</strong></p> <p>Now, let me use the same <code>data.frame</code>, but instead of computing <code>sum</code>, let me construct a bit more demanding function, say, <code>median(.) * median(rnorm(1e4)</code> ((meaningless, yes):</p> <p>You'll see that the tides are beginning to shift:</p> <pre><code># parallelisation using doMC + plyr P.PLYR &lt;- function() { o1 &lt;- ddply(df, .(group), function(x) median(x$val) * median(rnorm(1e4)), .parallel = TRUE) } # no parallelisation PLYR &lt;- function() { o2 &lt;- ddply(df, .(group), function(x) median(x$val) * median(rnorm(1e4)), .parallel = FALSE) } &gt; benchmark(P.PLYR(), PLYR(), replications = 2, order = "elapsed") test replications elapsed relative user.self sys.self user.child sys.child 1 P.PLYR() 2 41.911 1.000 15.265 15.369 141.585 34.254 2 PLYR() 2 73.417 1.752 73.372 0.052 0.000 0.000 </code></pre> <p>Here, the <strong>parallel</strong> version is <code>1.752 times</code> <strong>faster</strong> than the non-parallel version.</p> <p><strong>Edit:</strong> Following @Paul's comment, I just implemented a small delay using <code>Sys.sleep()</code>. Of course the results are obvious. But just for the sake of completeness, here's the result on a 20*2 data.frame:</p> <pre><code>df &lt;- data.frame(group=sample(letters[1:5], 20, replace=T), val=sample(20)) # parallelisation using doMC + plyr P.PLYR &lt;- function() { o1 &lt;- ddply(df, .(group), function(x) { Sys.sleep(2) median(x$val) }, .parallel = TRUE) } # no parallelisation PLYR &lt;- function() { o2 &lt;- ddply(df, .(group), function(x) { Sys.sleep(2) median(x$val) }, .parallel = FALSE) } &gt; benchmark(P.PLYR(), PLYR(), replications = 2, order = "elapsed") # test replications elapsed relative user.self sys.self user.child sys.child # 1 P.PLYR() 2 4.116 1.000 0.056 0.056 0.024 0.04 # 2 PLYR() 2 20.050 4.871 0.028 0.000 0.000 0.00 </code></pre> <p>The difference here is not surprising.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload