Note that there are some explanatory texts on larger screens.

plurals
  1. POReplacing NAs with latest non-NA value
    primarykey
    data
    text
    <p>In a data.frame (or data.table), I would like to "fill forward" NAs with the closest previous non-NA value. A simple example, using vectors (instead of a <code>data.frame</code>) is the following:</p> <pre><code>&gt; y &lt;- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA) </code></pre> <p>I would like a function <code>fill.NAs()</code> that allows me to construct <code>yy</code> such that:</p> <pre><code>&gt; yy [1] NA NA NA 2 2 2 2 3 3 3 4 4 </code></pre> <p>I need to repeat this operation for many (total ~1 Tb) small sized <code>data.frame</code>s (~30-50 Mb), where a row is NA is all its entries are. What is a good way to approach the problem? </p> <p>The ugly solution I cooked up uses this function:</p> <pre><code>last &lt;- function (x){ x[length(x)] } fill.NAs &lt;- function(isNA){ if (isNA[1] == 1) { isNA[1:max({which(isNA==0)[1]-1},1)] &lt;- 0 # first is NAs # can't be forward filled } isNA.neg &lt;- isNA.pos &lt;- isNA.diff &lt;- diff(isNA) isNA.pos[isNA.diff &lt; 0] &lt;- 0 isNA.neg[isNA.diff &gt; 0] &lt;- 0 which.isNA.neg &lt;- which(as.logical(isNA.neg)) if (length(which.isNA.neg)==0) return(NULL) # generates warnings later, but works which.isNA.pos &lt;- which(as.logical(isNA.pos)) which.isNA &lt;- which(as.logical(isNA)) if (length(which.isNA.neg)==length(which.isNA.pos)){ replacement &lt;- rep(which.isNA.pos[2:length(which.isNA.neg)], which.isNA.neg[2:max(length(which.isNA.neg)-1,2)] - which.isNA.pos[1:max(length(which.isNA.neg)-1,1)]) replacement &lt;- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos))) } else { replacement &lt;- rep(which.isNA.pos[1:length(which.isNA.neg)], which.isNA.neg - which.isNA.pos[1:length(which.isNA.neg)]) replacement &lt;- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos))) } replacement } </code></pre> <p>The function <code>fill.NAs</code> is used as follows:</p> <pre><code>y &lt;- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA) isNA &lt;- as.numeric(is.na(y)) replacement &lt;- fill.NAs(isNA) if (length(replacement)){ which.isNA &lt;- which(as.logical(isNA)) to.replace &lt;- which.isNA[which(isNA==0)[1]:length(which.isNA)] y[to.replace] &lt;- y[replacement] } </code></pre> <p><strong><em>Output</em></strong></p> <pre><code>&gt; y [1] NA 2 2 2 2 3 3 3 4 4 4 </code></pre> <p>... which seems to work. But, man, is it ugly! Any suggestions?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload