Note that there are some explanatory texts on larger screens.

plurals
  1. POMulticore and memory usage in R under Ubuntu
    text
    copied!<p>I am running R on an Ubuntu workstation with 8 virtual cores and 8 Gb of ram. I was hoping to routinely use the multicore package to make use of the 8 cores in parallel; however I find that the whole R process becomes duplicated 8 times. As R actually seems to use much more memory than is reported in gc (by a factor 5, even after gc()), this means that even a relatively mild memory usage (one 200Mb object) becomes intractably memory-heavy once duplicated 8 times. I looked into bigmemory to have the child processes share the same memory space; but it would require some major rewriting of my code as it doesn't deal with dataframes.</p> <p>Is there a way to make R as lean as possible before forking, i.e. have the OS reclaim as much memory as possible?</p> <p>EDIT: I think I understand what is going on now. The problem is not where I thought it was -- objects that exist in the parent thread and are not manipulated do not get duplicated eight times. Instead my problem, I believe, came from the nature of the manipulation I am making each child process perform. Each has to manipulate a big factor with hundreds of thousands of levels, and I think <em>this</em> is the memory-heavy bit. As a result, it is indeed the case that the overall memory load is proportional to the number of cores; but not as dramatically as I thought. Another lesson I learned is that with 4 physical cores + possibility of hyperthreading, hyperthreading is actually not typically a good idea for R. The gain is minimal, and the memory cost may be non-trivial. So I'll be working on 4 cores from now on.</p> <p>For those who would like to experiment, this is the type of code I was running:</p> <pre><code># Create data sampdata &lt;- data.frame(id = 1:1000000) for (letter in letters) { sampdata[, letter] &lt;- rnorm(1000000) } sampdata$groupid = ceiling(sampdata$id/2) # Enable multicore library(multicore) options(cores=4) # number of cores to distribute the job to # Actual job system.time(do.call("cbind", mclapply(subset(sampdata, select = c(a:z)), function(x) tapply(x, sampdata$groupid, sum)) )) </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload