StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POoperating on pairs of elements in a data frame
text
Body
copied!<p>I have two data frames, <code>x</code> and <code>weights</code>, in which columns are paired. Here are example data frames:</p> <pre><code>x = read.table(text = " yr1 yr2 yr3 yr4 10 15 6 8 10 20 30 NA NA 5 2 3 100 100 NA NA", sep = "", header = TRUE) weights = read.table(text = " yr1 yr2 yr3 yr4 2 4 1 3 2 2 4 2 3 2 2 3 4 2 2 4", sep = "", header = TRUE) </code></pre> <p>The columns <code>yr1</code> and <code>yr2</code> are one pair and the columns <code>yr3</code> and <code>yr4</code> are another pair. With my actual data the columns go up to <code>yr100</code> and there are 50 pairs of columns. </p> <p>If <code>yr1</code> or <code>yr2</code> is missing in <code>x</code> I want to fill the missing observation with, for example:</p> <pre><code>(5 / 2) * 3 </code></pre> <p>Likewise for <code>yr3</code> or <code>yr4</code>:</p> <pre><code>(30 / 4) * 2 </code></pre> <p>where 5 (or 30) is the element in the column in <code>x</code> that is not missing for a given pair of elements. The values 2 and 3 for the first example (and the values 4 and 2 in the second example) are the corresponding elements in the <code>weights</code> data frame for a given pair of elements in the <code>x</code> data frame. If both elements in a pair are missing in <code>x</code> I want to leave them as missing.</p> <p>Here is <code>R</code> code that does the above operations using nested <code>for loops</code>. However, there are 2000 or 3000 rows in my actual data set and the nested <code>for loops</code> have been running now for >10 hours.</p> <pre><code>for(i in 1: (ncol(x)/2)) { for(j in 1: nrow(x)) { if( is.na(x[j,(1 + (i-1)*2)]) & !is.na(x[j,(1 + (i-1)*2 + 1)])) x[j,(1 + (i-1)*2 + 0)] = (x[j,(1 + ((i-1)*2 + 1))] / weights[j,(1 + ((i-1)*2 + 1))]) * weights[j,(1 + (i-1)*2 + 0)] if(!is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) x[j,(1 + (i-1)*2 + 1)] = (x[j,(1 + ((i-1)*2 + 0))] / weights[j,(1 + ((i-1)*2 + 0))]) * weights[j,(1 + (i-1)*2 + 1)] if( is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) x[j,(1 + (i-1)*2 + 0)] = NA if( is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) x[j,(1 + (i-1)*2 + 1)] = NA } } </code></pre> <p>I have realized that the third and fourth <code>if</code> statements probably are not necessary. Perhaps the time to run this code will be reduced substantially if I simply remove those two <code>if</code> statements.</p> <p>However, I also came up with the following alternative solution that uses <code>reshape</code> instead of nested <code>for loops</code>:</p> <pre><code>n.years <- 4 x2 <- reshape(x , direction="long", varying = list(seq(1,(n.years-1),2), seq(2,n.years,2)), v.names = c("yr1", "yr2"), times = c("t1", "t2")) wt2 <- reshape(weights, direction="long", varying = list(seq(1,(n.years-1),2), seq(2,n.years,2)), v.names = c("yr1", "yr2"), times = c("t1", "t2")) x2$yr1 <- ifelse(is.na(x2$yr1), (x2$yr2 / wt2$yr2) * wt2$yr1, x2$yr1) x2$yr2 <- ifelse(is.na(x2$yr2), (x2$yr1 / wt2$yr1) * wt2$yr2, x2$yr2) x3 <- reshape(x2, direction="wide", varying = list(seq(1,3,2), seq(2,4,2)), v.names = c("yr1", "yr2"), times = c("t1", "t2")) x3 </code></pre> <p>Before I shut the current R session down and try one of the above approaches please suggest possible alternatives that might be more efficient. I have used <code>microbenchmark</code> a little bit, but have not yet attempted to do so here, partially because writing a function for each possible solution is a little intimidating to me. I also tried coming up with a solution using the <code>apply</code> family of functions, but could not come up with one.</p> <p>My <code>reshape</code> solution was derived from this question:</p> <p><a href="https://stackoverflow.com/questions/12837609/reshaping-a-data-frame-with-more-than-one-measure-variable">Reshaping a data frame with more than one measure variable</a></p> <p>In addition to computation time I am also concerned about possible memory exhaustion.</p> <p>I try hard to stick with base R, but will consider using other options to obtain desired output. Thank you for any suggestions.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload