StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POHow to fill data frames in a manner dependent on values in other rows and columns in R
text
Body
copied!<p>Suppose I have a data frame that looks like this:</p> <pre><code>ID T X Y Z 1 1 A A NA 1 2 B A NA 1 3 B B NA 1 4 B A NA 2 1 A B NA 2 2 A A NA 2 3 B A NA 2 4 A B NA 3 1 B B NA 3 2 B B NA 3 3 B B NA 3 4 B A NA </code></pre> <p>And I would like to replace the value of Z based on some conditionals that depend on both row and (previous) column values so that the above ends up looking like this:</p> <pre><code>ID T X Y Z 1 1 A A 0 1 2 B A 0 1 3 B B 1 1 4 B A NA 2 1 A B 0 2 2 A A 0 2 3 B A 0 2 4 A B 0 3 1 B B 1 3 2 B B NA 3 3 B B NA 3 4 B A NA </code></pre> <p>The rules:</p> <ol> <li>Z takes the value of 1 the first time (in order by T, and within an ID) that both X and Y one that row have the value B.<br></li> <li>Z takes (or retains) the value NA if and only if for any smaller value of T, it has taken the value of 1 already.<br></li> <li>When T = 1, Z takes the value of 0 if X and Y on that row do not both equal B.<br></li> <li>When T > 1, Z takes the value of 0 if X and Y on that row do not both equal B, AND the value of Z on the previous row = zero.<br></li> </ol> <p>I want the following to work, and it gets me kinda close but no dice:</p> <pre><code>df$Z <- NA for (t in 1:4) { df$Z[ (df$X=="B" & df$Y=="B") & df$T==1] <- 1 df$Z[!(df$X=="B" & df$Y=="B") & df$T==1] <- 0 if (t>1) { df$Z[ (df$X=="B" & df$Y=="B") & df$T==t & (!is.na(df$Z[t-1]) & df$Z[t-1]==0)] <- 0 df$Z[!(df$X=="B" & df$Y=="B") & df$T==t & (!is.na(df$Z[t-1]) & df$Z[t-1]==0)] <- 1 } } </code></pre> <p>On the other hand, I can write series of nested <code>if... then</code> statements looping across all observations, but that is <em>excruciatingly</em> slow (at least, compared to the program I am translating from on Stata).</p> <p>I am sure I am committing twelve kinds of gaffes in my attempt above, but a few hours of banging my head on this has not resolved it.</p> <p>So I come to you begging, hat in hand. :)</p> <p><strong>Edit:</strong> it occurs to me that sharing the Stata code (which resolves this <em>so</em> much faster than what I have come up with in R, which is ironic, given my preference for R over Stata's language :) might help with suggestions. This does what I want, and does it fast (even with, say, N=1600, T=11):</p> <pre><code>replace Z = . forvalues t = 1(1)4 { replace Z = 1 if X == "B" & Y == "B" & T == 1 replace Z = 0 if X == "B" & Y == "B" & T == 1 replace Z = 1 if X == "B" & Y == "B" & T == `t' & Z[_n-1] == 0 & `t' > 1 replace Z = 0 if X == "B" & Y == "B" & T == `t' & Z[_n-1] == 0 & `t' > 1 } </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload