Note that there are some explanatory texts on larger screens.

plurals
  1. POR: split a data-frame, apply a function to all row-pairs in each subset
    text
    copied!<p>I am new to R and am trying to accomplish the following task <code>efficiently</code>.</p> <p>I have a <code>data.frame</code>, <code>x</code>, with columns: <code>start</code>, <code>end</code>, <code>val1</code>, <code>val2</code>, <code>val3</code>, <code>val4</code>. The columns are sorted/ordered by <code>start</code>. </p> <p>For each <code>start</code>, first I have to find all the entries in <code>x</code> that share the same <code>start</code>. Because the list is ordered, they will be consecutive. If a particular <code>start</code> occurs only once, then I <i>ignore</i> it. Then, for these entries that have the same <code>start</code>, lets say for one particular <code>start</code>, there are 3 entries, as shown below: </p> <p>entries for <code>start=10</code></p> <pre>start end val1 val2 val3 val4 10 25 8 9 0 0 10 55 15 200 4 9 10 30 4 8 0 1</pre> <p>Then, I have to take 2 rows at a time and perform a <code>fisher.test</code> on the <code>2x4</code> matrices of <code>val1:4</code>. That is,</p> <pre>row1:row2 => fisher.test(matrix(c(8,15,9,200,0,4,0,9), nrow=2)) row1:row3 => fisher.test(matrix(c(8,4,9,8,0,0,0,1), nrow=2)) row2:row3 => fisher.test(matrix(c(15,4,200,8,4,0,9,1), nrow=2))</pre> <p>The code I wrote is accomplished using <code>for-loops</code>, traditionally. I was wondering if this could be <b>vectorized</b> or improved in anyway. </p> <pre>f_start = as.factor(x$start) #convert start to factor to get count tab_f_start = as.table(f_start) # convert to table to access count o_start1 = NULL o_end1 = NULL o_start2 = NULL o_end2 = NULL p_val = NULL for (i in 1:length(tab_f_start)) { <b># check if there are more than 1 entries with same start</b> if ( tab_f_start[i] > 1) { <b># get all rows for current start</b> cur_entry = x[x$start == as.integer(names(tab_f_start[i])),] <b># loop over all combinations to obtain p-values</b> ctr = tab_f_start[i] for (j in 1:(ctr-1)) { for (k in (j+1):ctr) { <b># store start and end values separately</b> o_start1 = c(o_start1, x$start[j]) o_end1 = c(o_end1, x$end[j]) o_start2 = c(o_start2, x$start[k]) o_end2 = c(o_end2, x$end[k]) <b># construct matrix</b> m1 = c(x$val1[j], x$val1[k]) m2 = c(x$val2[j], x$val2[k]) m3 = c(x$val3[j], x$val3[k]) m4 = c(x$val4[j], x$val4[k]) m = matrix(c(m1,m2,m3,m4), nrow=2) p_val = c(p_val, fisher.test(m)) } } } } result=data.frame(o_start1, o_end1, o_start2, o_end2, p_val)</pre> <p>Thank you! </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload