StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POHow to parallel a R script or run it on chunks
primarykey
Id
17599090
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2013-07-11T16:48:13.477
FavoriteCount
0
LastActivityDate
2013-07-11T20:31:15.040
LastEditDate
2013-07-11T20:31:15.040
LastEditorUserId
2380782
OwnerUserId
2380782
ParentId
0
PostTypeId
1
Score
2
ViewCount
195
LastEditorDisplayName
text
Body
I have a data.frame and a list. My real data is really huge, so the examples here are a simplification of my current data. <pre><code>>df A mac pval P1 P2 P3 P4 P5 P6 1 a 1 0.1 0.1 0.1 0.4 0.2 0.1 0.4 2 b 1 0.2 0.1 0.4 0.2 0.1 0.2 0.2 3 c 1 0.4 0.4 0.1 0.2 0.1 0.1 0.4 4 d 2 0.1 0.1 0.7 0.5 0.1 0.7 0.1 5 e 2 0.5 0.7 0.5 0.1 0.7 0.1 0.5 6 f 2 0.7 0.5 0.5 0.7 0.1 0.7 0.1 7 g 3 0.1 0.1 0.1 0.2 0.2 0.2 0.5 8 h 3 0.2 0.2 0.1 0.5 0.2 0.2 0.5 9 i 3 0.5 0.1 0.2 0.1 0.1 0.5 0.2 ll <- list(data.frame(AA=c("a","b","c","d")), data.frame(BB=c("e","f")), data.frame(CC=c("a","b","i")), data.frame(DD=c("d","e","f","g"))) </code></pre> Thanks to @RicardoSaporta and others I've written the following code: <pre><code>#load libraries library(plyr) library(data.table) #Create a list of `df` according to `mac` value split.mac = split(df, df$mac) mac.pval = lapply(split.mac, '[[', 3) df.order <- df[order(df$mac),] #Create a list of permuted pvals using elements in list `mac.pval` l3 <- list() ll1 <- length(mac.pval) length(l3) <- ll1 set.seed(4) for (i in 1:ll1){ vec1 <- mac.pval[[i]] jl <- 1;jr<-1; while (length(vec1) < 4){ if(i==1 || i-jl==0) { vec1 <- c(vec1, mac.pval[[i+jr]]) jr <- jr+1 } else if (i==ll1 || jr+i==ll1 ){ vec1 <- c(vec1, mac.pval[[i-jl]]) jl <- jl+1 }else { vec1 <- c(vec1, mac.pval[[i-jl]], mac.pval[[i+jr]]) jl <- jl+1 jr <- jr+1 } } l3[[i]] <- vec1 } #Put same names in both lists names(l3) <- names(mac.pval) #Create the permutations based on `l3` and add as columns to the data.frame mac.order mac.perm <- cbind(df.order, t(sapply(df.order$mac, function(i, l) sample(l[[as.character(i)]], 10000, replace=T), l = l3))) #Change to data.table to speed up the calculations and keep the used RAM memory low mac.perm.dt <- data.table(mac.perm, key='gene') p.col.names <- paste0("P", 1:6) nombres = c("gene", "mac", "pval", p.col.names) names(mac.perm.dt) <- nombres pval <- "pval" Fisher.test <- function(p) { Xsq <- -2*sum(log(p), na.rm=TRUE) p.val <- 1-pchisq(Xsq, df = 2*sum(!is.na(p))) return(p.val) } #Apply the function `Fisher.test` to pval and permuted columns in mac.order that corresponds to elements in the list ll results.rand <- lapply(df.split, function(ll) mac.perm.dt[.(ll)][, lapply(.SD, Fisher.test), .SDcols=p.col.names] ) results.real <- lapply(df.split, function(ll) mac.perm.dt[.(ll)][, lapply(.SD, Fisher.test), .SDcols=pval] ) #Calculate the permuted p-values, how many times the results in results.real are higher or equal to the elements of list L2 #Transform results.real into a list and results.rand into a matrix to speed-up calculations L1 <- as.vector(unlist(results.real)) L2 <- as.matrix(rbindlist(results.rand)) perm.pval <- (rowSums(L1 >= L2) + 1) / (ncol(L2)+1) names(perm.pval) <- names(results.rand) </code></pre> This is my code. My real data consists of a list of 9,000 elements with a <code>length(ll[i])</code> between 3 and 300 and a data.frame where the number of rows is 15,000. I want to run a million of permutations but this is impossible in terms of RAM memory even when I running it on a 256 GB RAM server. So, my idea is divide the job in chunks and store different <code>perm.pval</code> objects to combine them afterwards. However, I need to do the sampling procedure separately for avoiding pick the same values each time. I can do it manually running 100 jobs of 10000 permutations but in chunks of 10 to do not reach the maximum level of RAM that I can use. I wonder if there is a way to do it automatically, i.e, to run a high number of R jobs in the command line but not at the same time, i.e, to run 10 wait to finish and then another 10 (I'm suggesting this to avoid the use of RAM). Any clues are welcome 
Tags
<r><command-line><parallel-processing>
Title
How to parallel a R script or run it on chunks
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USuser2380782
UserOwnerUserId
1. USuser2380782
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POHow to parallel a R script or run it on chunks
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POHow to parallel a R script or run it on chunks
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COI had a similar need a while back, and [asked this question which received some helpful answers](http://stackoverflow.com/questions/15837617/writing-code-to-start-an-r-session-run-r-script-terminate-session-repeat). Also, though I haven't personally tried it, [pqR--"pretty quick R"](http://radfordneal.wordpress.com/2013/06/22/announcing-pqr-a-faster-version-of-r/)--supposedly automatically parallelizes jobs and does a better job of memory management, so it might be worth a try.
 singulars
 PostPostId
 POHow to parallel a R script or run it on chunks
 UserUserId
 USsc_evans
2. COthanks @sc_evans, I've incorporated some ideas from your answered post, we'll see if things are going well. BTW, pQR seems really interesting....
 singulars
 PostPostId
 POHow to parallel a R script or run it on chunks
 UserUserId
 USuser2380782

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.