Note that there are some explanatory texts on larger screens.

plurals
  1. POworkflow for creating timevarying covariates in r
    primarykey
    data
    text
    <p>I have a huge data file in long format-parts of it supplied below. Each ID can have several rows, where status is the final status. However I need to do the analysis with time varying covariates and hence need to create two new time variables and update the status variable. I´ve been struggling with this for some time now and I cannot figure out how to do this efficiently as there can be as many as four rows per ID. The time varying variable is <code>NUM.AFTER.DIAG</code>. If <code>NUM.AFTER.DIAG==0</code> then it is easy, where <code>time1=0</code> and <code>time2=STATUSDATE</code>. However when <code>NUM.AFTER.DIAG==1</code> then I need to make a new row where <code>time1=0</code>, <code>time2=DOB-DATE.DIAG</code> and <code>NUM.AFTER.DIAG=0</code> and also make sure <code>STATUS="B"</code>. The second row would then be <code>time1=time2</code> from the previous row and <code>time2=STATUSDATE-DATE.DIAG-time1</code> from this row. Equally if there are more rows then the different rows needs to be subtracted from each other. Also if NUM.AFTER.DIAG==0 but there are multiple rows then all extra rows can be deleted.</p> <p>Any ideas for an efficient solution to this? I´ve looked at john Fox unfold command, but it assumes that all the intervals are in wide format to begin with.</p> <p>Edit: The table as requested. As for the censor variable: "D"=event (death)</p> <p><img src="https://i.stack.imgur.com/wvcx7.png" alt="enter image description here"></p> <pre><code> structure(list(ID = c(187L, 258L, 265L, 278L, 281L, 281L, 283L, 283L, 284L, 291L, 292L, 292L, 297L, 299L, 305L, 305L, 311L, 311L, 319L, 319L, 319L, 322L, 322L, 329L, 329L, 333L, 333L, 333L, 334L, 334L), STATUS = c("D", "B", "B", "B", "B", "B", "D", "D", "B", "B", "B", "B", "D", "D", "D", "D", "B", "B", "B", "B", "B", "D", "D", "B", "B", "D", "D", "D", "D", "D"), STATUSDATE = structure(c(11153, 15034, 15034, 15034, 15034, 15034, 5005, 5005, 15034, 15034, 15034, 15034, 6374, 5005, 7562, 7562, 15034, 15034, 15034, 15034, 15034, 7743, 7743, 15034, 15034, 4670, 4670, 4670, 5218, 5218 ), class = "Date"), DATE.DIAG = structure(c(4578, 4609, 4578, 4487, 4670, 4670, 4517, 4517, 4640, 4213, 4397, 4397, 4397, 4487, 4213, 4213, 4731, 4731, 4701, 4701, 4701, 4397, 4397, 4578, 4578, 4275, 4275, 4275, 4456, 4456), class = "Date"), DOB = structure(c(NA, 13010, NA, NA, -1082, -626, 73, 1353, 13679, NA, 1626, 3087, -626, -200, 2814, 3757, 1930, 3787, 6740, 13528, 14167, 5462, 6557, 7865, 9235, -901, -504, -108, -535, -78), class = "Date"), NUM.AFTER.DIAG = c(0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 1, 2, 1, 2, 0, 0, 0, 0, 0)), .Names = c("ID", "STATUS", "STATUSDATE", "DATE.DIAG", "DOB", "NUM.AFTER.DIAG"), row.names = c(NA, 30L), class = "data.frame") </code></pre> <p>EDIT: I did come up with a solution, although probably not very efficient.</p> <pre><code> u1&lt;-ddply(p,.(ID),function(x) { if (x$NUM.AFTER.DIAG==0){ x$time1&lt;-0 x$time2&lt;-x$STATUSDATE-x$DATE.DIAG x&lt;-x[1,] } else { x&lt;-rbind(x,x[1,]) x&lt;-x[order(x$DOB),] u&lt;-max(x$NUM.AFTER.DIAG) x$NUM.AFTER.DIAG&lt;-0:u x$time1[1]&lt;-0 x$time2[1:(u)]&lt;-x$DOB[2:(u+1)]-x$DATE.DIAG[2:(u+1)] x$time2[u+1]&lt;-x$STATUSDATE[u]-x$DATE.DIAG[u] x$time1[2:(u+1)]&lt;-x$time2[1:u] x$STATUS[1:u]&lt;-"B" } x } ) </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload