Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The <code>apply</code> functions in R don't provide improved performance over other looping functions (e.g. <code>for</code>). One exception to this is <code>lapply</code> which can be a little faster because it does more work in C code than in R (see <a href="https://stackoverflow.com/questions/1692336/applying-a-function-to-a-distance-matrix-in-r">this question for an example of this</a>). </p> <p>But in general, the rule is that <strong><em>you should use an apply function for clarity, not for performance</em></strong>. </p> <p>I would add to this that <strong><em>apply functions have <a href="http://en.wikipedia.org/wiki/Side_effect_(computer_science)" rel="noreferrer">no side effects</a></em></strong>, which is an important distinction when it comes to functional programming with R. This can be overridden by using <code>assign</code> or <code>&lt;&lt;-</code>, but that can be very dangerous. Side effects also make a program harder to understand since a variable's state depends on the history.</p> <p><em>Edit:</em></p> <p>Just to emphasize this with a trivial example that recursively calculates the Fibonacci sequence; this could be run multiple times to get an accurate measure, but the point is that none of the methods have significantly different performance:</p> <pre><code>&gt; fibo &lt;- function(n) { + if ( n &lt; 2 ) n + else fibo(n-1) + fibo(n-2) + } &gt; system.time(for(i in 0:26) fibo(i)) user system elapsed 7.48 0.00 7.52 &gt; system.time(sapply(0:26, fibo)) user system elapsed 7.50 0.00 7.54 &gt; system.time(lapply(0:26, fibo)) user system elapsed 7.48 0.04 7.54 &gt; library(plyr) &gt; system.time(ldply(0:26, fibo)) user system elapsed 7.52 0.00 7.58 </code></pre> <p><em>Edit 2:</em></p> <p>Regarding the usage of parallel packages for R (e.g. rpvm, rmpi, snow), these do generally provide <code>apply</code> family functions (even the <code>foreach</code> package is essentially equivalent, despite the name). Here's a simple example of the <code>sapply</code> function in <code>snow</code>:</p> <pre><code>library(snow) cl &lt;- makeSOCKcluster(c("localhost","localhost")) parSapply(cl, 1:20, get("+"), 3) </code></pre> <p>This example uses a socket cluster, for which no additional software needs to be installed; otherwise you will need something like PVM or MPI (see <a href="http://www.stat.uiowa.edu/~luke/R/cluster/cluster.html" rel="noreferrer">Tierney's clustering page</a>). <code>snow</code> has the following apply functions:</p> <pre><code>parLapply(cl, x, fun, ...) parSapply(cl, X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) parApply(cl, X, MARGIN, FUN, ...) parRapply(cl, x, fun, ...) parCapply(cl, x, fun, ...) </code></pre> <p>It makes sense that <code>apply</code> functions should be used for parallel execution since they <em>have no <strong><a href="http://en.wikipedia.org/wiki/Side_effect_(computer_science)" rel="noreferrer">side effects</a></em></strong>. When you change a variable value within a <code>for</code> loop, it is globally set. On the other hand, all <code>apply</code> functions can safely be used in parallel because changes are local to the function call (unless you try to use <code>assign</code> or <code>&lt;&lt;-</code>, in which case you can introduce side effects). Needless to say, it's critical to be careful about local vs. global variables, especially when dealing with parallel execution.</p> <p><em>Edit:</em></p> <p>Here's a trivial example to demonstrate the difference between <code>for</code> and <code>*apply</code> so far as side effects are concerned:</p> <pre><code>&gt; df &lt;- 1:10 &gt; # *apply example &gt; lapply(2:3, function(i) df &lt;- df * i) &gt; df [1] 1 2 3 4 5 6 7 8 9 10 &gt; # for loop example &gt; for(i in 2:3) df &lt;- df * i &gt; df [1] 6 12 18 24 30 36 42 48 54 60 </code></pre> <p>Note how the <code>df</code> in the parent environment is altered by <code>for</code> but not <code>*apply</code>.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload