Note that there are some explanatory texts on larger screens.

plurals
  1. POwhat parallel algorithms exist in R, working on large data
    primarykey
    data
    text
    <p>I'm trying to find out which statistical/data mining algorithms in R or R packages at CRAN/github/R-Forge exist that can handle large datasets either in parallel on 1 server or sequentially without running into out-of-memory issues or which work on several machines at once. This in order to evaluate if I can easily port them to work with ff/ffbase like ffbase::bigglm.ffdf.</p> <p>I would like to split these up into 3 parts:</p> <ol> <li><p>Algorithms that update or work on parameter estimates in parallel</p> <ul> <li><p>Buckshot (<a href="https://github.com/lianos/buckshot" rel="nofollow">https://github.com/lianos/buckshot</a>)</p></li> <li><p>lm.fit @ Programming For Big Data (<a href="https://github.com/RBigData" rel="nofollow">https://github.com/RBigData</a>)</p></li> </ul></li> <li><p>Algorithms that work sequentially (get data in R but only use 1 process and only 1 process updates the parameters)</p> <ul> <li><p>bigglm (<a href="http://cran.r-project.org/web/packages/biglm/index.html" rel="nofollow">http://cran.r-project.org/web/packages/biglm/index.html</a>)</p></li> <li><p>Compound Poisson linear models (<a href="http://cran.r-project.org/web/packages/cplm/index.html" rel="nofollow">http://cran.r-project.org/web/packages/cplm/index.html</a>)</p></li> <li><p>Kmeans @ biganalytics (<a href="http://cran.r-project.org/web/packages/biganalytics/index.html" rel="nofollow">http://cran.r-project.org/web/packages/biganalytics/index.html</a>)</p></li> </ul></li> <li><p>Work on part of the data</p> <ul> <li>Distributed text processing (<a href="http://www.jstatsoft.org/v51/i05/paper" rel="nofollow">http://www.jstatsoft.org/v51/i05/paper</a>)</li> </ul></li> </ol> <p>And I would like to exclude simple parallelisation like optimising over a hyperparameter by e.g. crossvalidating. Any other pointers to these kind of models/optimisers or algorithms? Maybe Bayesian? Maybe a package called RGraphlab (http://graphlab.org/)?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload