Note that there are some explanatory texts on larger screens.

plurals
  1. POIs there a faster way to apply logical operations to subset a large dataset in R?
    text
    copied!<p>first post on StackOverflow, so be gentle if I don't get the etiquette quite right.</p> <p>I have a big data frame (well, seven of them actually, but that isn't important) containing hands drawn from a deck of cards. I have another array that goes with it, showing which cards out of the initial hand a player chose to hold. Any cards that were not held, are re-drawn from the deck. The first data frame holds all the drawn cards, so each row can be anywhere between 5 and 10 columns long, for cards held between 5 and 0. Does that make sense? For example:</p> <pre><code>&gt; str(cards01) 'data.frame': 5044033 obs. of 10 variables &gt; head(cards01) V1 V2 V3 V4 V5 V6 V7 V8 structure(c("", "", "", "", "", ""), class = "AsIs") 1 D0 D10 H0 C5 H1 S3 C4 D6 2 D5 S10 H7 C7 S0 S5 S12 H5 3 S4 H4 C1 D4 D11 H6 D1 4 C3 C9 D9 S10 S2 C7 S3 D2 5 H11 C0 C6 H3 H12 C11 S0 6 C10 C9 D11 D8 D5 S8 &gt; str(heldCards01) num [1:5044033, 1:5] 1 3 1 2 1 1 2 1 1 1 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ ..$ : chr [1:5] "1" "2" "3" "4" ... &gt; head(heldCards01) 1 2 3 4 5 [1,] 1 3 NA NA NA [2,] 3 4 NA NA NA [3,] 1 2 4 NA NA [4,] 2 3 NA NA NA [5,] 1 4 5 NA NA [6,] 1 2 3 4 NA </code></pre> <p>So what I'm doing, is making a new data frame that just contains the cards that the player ends up with, ie, remove the cells for each row which aren't identified in the held cards array. I've written code to perform this, but it's now been running all weekend and still hasn't finished. This is the code I'm running (this is all happening in an lapply to go through each of the dataframe/matrix pairs I have, the bit I'm trying to optimize is happening in the mclapply):</p> <pre><code>all.hands &lt;- lapply(stakes, function(stake){ cardsOb &lt;- get(paste("cards", stake, sep = "")) heldOb &lt;- get(paste("heldCards", stake, sep = "")) l &lt;- length(cardsOb[,1]) mclapply(1:l, function(rowNum) { row &lt;- (heldOb[rowNum,]) theNAs &lt;- as.logical(is.na(row)) heldIndex &lt;- row[!theNAs] discarded &lt;- c(1,2,3,4,5)[-heldIndex] if(length(discarded) &gt;= 1) { hand &lt;- cardsOb[rowNum,-discarded] } else { hand &lt;- cardsOb[rowNum,] } hand &lt;- sort(hand) }) }) </code></pre> <p>Are there any functions I'm missing that could cut out some steps? Would it be faster if the data frame was an array? Do I just have to wait for days &amp; days? I'm running on on a Z620 with two Xeon E5-2407 quad core processors and 32GB memory if that matters.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload