Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Have you tried <a href="http://datatable.r-forge.r-project.org/" rel="nofollow">data.table</a>?</p> <pre><code>&gt; system.time(ans1 &lt;- do.call("cbind", lapply(subset(sampdata,select=c(a:z)),function(x)tapply(x,sampdata$groupid,sum)) )) user system elapsed 906.157 13.965 928.645 &gt; require(data.table) &gt; DT = as.data.table(sampdata) &gt; setkey(DT,groupid) &gt; system.time(ans2 &lt;- DT[,lapply(.SD,sum),by=groupid]) user system elapsed 186.920 1.056 191.582 # 4.8 times faster &gt; # massage minor diffs in results... &gt; ans2$groupid=NULL &gt; ans2=as.matrix(ans2) &gt; colnames(ans2)=letters &gt; rownames(ans1)=NULL &gt; identical(ans1,ans2) [1] TRUE </code></pre> <p>Your example is very interesting. It is reasonably large (200MB), there are many groups (1/2 million), and each group is very small (2 rows). The 191s can probably be improved by quite a lot, but at least it's a start. [March 2011]</p> <hr> <p>And now, this idiom (i.e. <code>lapply(.SD,...)</code>) has been improved a lot. With v1.8.2, and on a faster computer than the test above, and with the latest version of R etc, here is the updated comparison :</p> <pre><code>sampdata &lt;- data.frame(id = 1:1000000) for (letter in letters) sampdata[, letter] &lt;- rnorm(1000000) sampdata$groupid = ceiling(sampdata$id/2) dim(sampdata) # [1] 1000000 28 system.time(ans1 &lt;- do.call("cbind", lapply(subset(sampdata,select=c(a:z)),function(x) tapply(x,sampdata$groupid,sum)) )) # user system elapsed # 224.57 3.62 228.54 DT = as.data.table(sampdata) setkey(DT,groupid) system.time(ans2 &lt;- DT[,lapply(.SD,sum),by=groupid]) # user system elapsed # 11.23 0.01 11.24 # 20 times faster # massage minor diffs in results... ans2[,groupid:=NULL] ans2[,id:=NULL] ans2=as.matrix(ans2) rownames(ans1)=NULL identical(ans1,ans2) # [1] TRUE </code></pre> <p><br></p> <pre><code>sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] data.table_1.8.2 RODBC_1.3-6 </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload