Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to parallel a R script or run it on chunks
    primarykey
    data
    text
    <p>I have a data.frame and a list. My real data is really huge, so the examples here are a simplification of my current data.</p> <pre><code>&gt;df A mac pval P1 P2 P3 P4 P5 P6 1 a 1 0.1 0.1 0.1 0.4 0.2 0.1 0.4 2 b 1 0.2 0.1 0.4 0.2 0.1 0.2 0.2 3 c 1 0.4 0.4 0.1 0.2 0.1 0.1 0.4 4 d 2 0.1 0.1 0.7 0.5 0.1 0.7 0.1 5 e 2 0.5 0.7 0.5 0.1 0.7 0.1 0.5 6 f 2 0.7 0.5 0.5 0.7 0.1 0.7 0.1 7 g 3 0.1 0.1 0.1 0.2 0.2 0.2 0.5 8 h 3 0.2 0.2 0.1 0.5 0.2 0.2 0.5 9 i 3 0.5 0.1 0.2 0.1 0.1 0.5 0.2 ll &lt;- list(data.frame(AA=c("a","b","c","d")), data.frame(BB=c("e","f")), data.frame(CC=c("a","b","i")), data.frame(DD=c("d","e","f","g"))) </code></pre> <p>Thanks to @RicardoSaporta and others I've written the following code:</p> <pre><code>#load libraries library(plyr) library(data.table) #Create a list of `df` according to `mac` value split.mac = split(df, df$mac) mac.pval = lapply(split.mac, '[[', 3) df.order &lt;- df[order(df$mac),] #Create a list of permuted pvals using elements in list `mac.pval` l3 &lt;- list() ll1 &lt;- length(mac.pval) length(l3) &lt;- ll1 set.seed(4) for (i in 1:ll1){ vec1 &lt;- mac.pval[[i]] jl &lt;- 1;jr&lt;-1; while (length(vec1) &lt; 4){ if(i==1 || i-jl==0) { vec1 &lt;- c(vec1, mac.pval[[i+jr]]) jr &lt;- jr+1 } else if (i==ll1 || jr+i==ll1 ){ vec1 &lt;- c(vec1, mac.pval[[i-jl]]) jl &lt;- jl+1 }else { vec1 &lt;- c(vec1, mac.pval[[i-jl]], mac.pval[[i+jr]]) jl &lt;- jl+1 jr &lt;- jr+1 } } l3[[i]] &lt;- vec1 } #Put same names in both lists names(l3) &lt;- names(mac.pval) #Create the permutations based on `l3` and add as columns to the data.frame mac.order mac.perm &lt;- cbind(df.order, t(sapply(df.order$mac, function(i, l) sample(l[[as.character(i)]], 10000, replace=T), l = l3))) #Change to data.table to speed up the calculations and keep the used RAM memory low mac.perm.dt &lt;- data.table(mac.perm, key='gene') p.col.names &lt;- paste0("P", 1:6) nombres = c("gene", "mac", "pval", p.col.names) names(mac.perm.dt) &lt;- nombres pval &lt;- "pval" Fisher.test &lt;- function(p) { Xsq &lt;- -2*sum(log(p), na.rm=TRUE) p.val &lt;- 1-pchisq(Xsq, df = 2*sum(!is.na(p))) return(p.val) } #Apply the function `Fisher.test` to pval and permuted columns in mac.order that corresponds to elements in the list ll results.rand &lt;- lapply(df.split, function(ll) mac.perm.dt[.(ll)][, lapply(.SD, Fisher.test), .SDcols=p.col.names] ) results.real &lt;- lapply(df.split, function(ll) mac.perm.dt[.(ll)][, lapply(.SD, Fisher.test), .SDcols=pval] ) #Calculate the permuted p-values, how many times the results in results.real are higher or equal to the elements of list L2 #Transform results.real into a list and results.rand into a matrix to speed-up calculations L1 &lt;- as.vector(unlist(results.real)) L2 &lt;- as.matrix(rbindlist(results.rand)) perm.pval &lt;- (rowSums(L1 &gt;= L2) + 1) / (ncol(L2)+1) names(perm.pval) &lt;- names(results.rand) </code></pre> <p>This is my code. My real data consists of a list of 9,000 elements with a <code>length(ll[i])</code> between 3 and 300 and a data.frame where the number of rows is 15,000. I want to run a million of permutations but this is impossible in terms of RAM memory even when I running it on a 256 GB RAM server. So, my idea is divide the job in chunks and store different <code>perm.pval</code> objects to combine them afterwards. However, I need to do the sampling procedure separately for avoiding pick the same values each time. I can do it manually running 100 jobs of 10000 permutations but in chunks of 10 to do not reach the maximum level of RAM that I can use. I wonder if there is a way to do it automatically, i.e, to run a high number of R jobs in the command line but not at the same time, i.e, to run 10 wait to finish and then another 10 (I'm suggesting this to avoid the use of RAM).</p> <p>Any clues are welcome </p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload