Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Here is a solution using <code>plyr</code> with a simulated dataset.</p> <pre><code>library(plyr) set.seed(1001) dat = data.frame(matrix(rnorm(1000), ncol = 10), treatment = sample(c("control", "control", "treatment"), 100, replace = T) ) # divide data set into training and test sets tr_prop = 0.5 # proportion of full dataset to use for training training_set = ddply(dat, .(treatment), function(., seed) { set.seed(seed); .[sample(1:nrow(.), trunc(nrow(.) * tr_prop)), ] }, seed = 101) test_set = ddply(dat, .(treatment), function(., seed) { set.seed(seed); .[-sample(1:nrow(.), trunc(nrow(.) * tr_prop)), ] }, seed = 101) # check that proportions are equal across datasets ddply(dat, .(treatment), function(.) nrow(.)/nrow(dat) ) ddply(training_set, .(treatment), function(.) nrow(.)/nrow(training_set) ) ddply(test_set, .(treatment), function(.) nrow(.)/nrow(test_set) ) c(nrow(training_set), nrow(test_set), nrow(dat)) # lengths of sets </code></pre> <p>Here, I use <code>set.seed()</code> to ensure identical behavior of <code>sample()</code> when constructing the training/test sets with <code>ddply</code>. This strikes me as a bit of a hack; perhaps there is another way to achieve the same result using a single call to <code>**ply</code> (but returning two dataframes). Another option (without egregious use of <code>set.seed</code>) would be to use <code>dlply</code> and then piece together elements of the resulting list into training/test sets:</p> <pre><code>set.seed(101) # for consistancy with 'ddply' above split_set = dlply(dat, .(treatment), function(.) { s = sample(1:nrow(.), trunc(nrow(.) * tr_prop)); list(.[s, ], .[-s,]) } ) # join together with ldply() training_set = ldply(split_set, function(.) .[[1]]) test_set = ldply(split_set, function(.) .[[2]]) </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload