Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Here is a solution using <code>plyr</code> with a simulated dataset.</p> <pre><code>library(plyr) set.seed(1001) dat = data.frame(matrix(rnorm(1000), ncol = 10), treatment = sample(c("control", "control", "treatment"), 100, replace = T) ) # divide data set into training and test sets tr_prop = 0.5 # proportion of full dataset to use for training training_set = ddply(dat, .(treatment), function(., seed) { set.seed(seed); .[sample(1:nrow(.), trunc(nrow(.) * tr_prop)), ] }, seed = 101) test_set = ddply(dat, .(treatment), function(., seed) { set.seed(seed); .[-sample(1:nrow(.), trunc(nrow(.) * tr_prop)), ] }, seed = 101) # check that proportions are equal across datasets ddply(dat, .(treatment), function(.) nrow(.)/nrow(dat) ) ddply(training_set, .(treatment), function(.) nrow(.)/nrow(training_set) ) ddply(test_set, .(treatment), function(.) nrow(.)/nrow(test_set) ) c(nrow(training_set), nrow(test_set), nrow(dat)) # lengths of sets </code></pre> <p>Here, I use <code>set.seed()</code> to ensure identical behavior of <code>sample()</code> when constructing the training/test sets with <code>ddply</code>. This strikes me as a bit of a hack; perhaps there is another way to achieve the same result using a single call to <code>**ply</code> (but returning two dataframes). Another option (without egregious use of <code>set.seed</code>) would be to use <code>dlply</code> and then piece together elements of the resulting list into training/test sets:</p> <pre><code>set.seed(101) # for consistancy with 'ddply' above split_set = dlply(dat, .(treatment), function(.) { s = sample(1:nrow(.), trunc(nrow(.) * tr_prop)); list(.[s, ], .[-s,]) } ) # join together with ldply() training_set = ldply(split_set, function(.) .[[1]]) test_set = ldply(split_set, function(.) .[[2]]) </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload