Note that there are some explanatory texts on larger screens.

plurals
  1. PODT[!(x == .)] and DT[x != .] treat NA in x inconsistently
    primarykey
    data
    text
    <p>This is something that I thought I should ask following <a href="https://stackoverflow.com/q/16221742/559784"><strong>this question</strong></a>. I'd like to confirm if this is a bug/inconsistency before filing it as a such in the R-forge tracker.</p> <p>Consider this <code>data.table</code>:</p> <pre><code>require(data.table) DT &lt;- data.table(x=c(1,0,NA), y=1:3) </code></pre> <p>Now, to access all rows of the DT that are <em>not</em> 0, we could do it in these ways:</p> <pre><code>DT[x != 0] # x y # 1: 1 1 DT[!(x == 0)] # x y # 1: 1 1 # 2: NA 3 </code></pre> <p><strong>Accessing <code>DT[x != 0]</code> and <code>DT[!(x==0)]</code> gives different results when the underlying logical operation is equivalent.</strong></p> <p><strong>Note:</strong> Converting this into a data.frame and running these operations will give results that are identical with each other for both logically equivalent operations, but that result is <em>different</em> from both these data.table results. For an explanation of why, look at <code>?`[`</code> under the section <code>NAs in indexing</code>. </p> <p><strong>Edit:</strong> Since some of you've stressed for equality with <code>data.frame</code>, here's the snippet of the output from the same operations on data.frame:</p> <pre><code>DF &lt;- as.data.frame(DT) # check ?`[` under the section `NAs in indexing` as to why this happens DF[DF$x != 0, ] # x y # 1 1 1 # NA NA NA DF[!(DF$x == 0), ] # x y # 1 1 1 # NA NA NA </code></pre> <p>I think this is an inconsistency and both <em>should provide</em> the same result. But, which result? The documentation for <code>[.data.table</code> says:</p> <blockquote> <p>i ---> Integer, logical or character vector, expression of column names, list or data.table.</p> <p>integer and logical vectors work the same way they do in [.data.frame. <strong>Other than NAs in logical i are treated as FALSE</strong> and a single NA logical is not recycled to match the number of rows, as it is in [.data.frame.</p> </blockquote> <p>It's clear why the results are different from what one would get from doing the same operation on a <code>data.frame</code>. But still, within data.table, if this is the case, then both of them should return:</p> <pre><code># x y # 1: 1 1 </code></pre> <p>I went through <code>[.data.table</code> source code and now understand as to <em>why</em> this is happening. See <a href="https://stackoverflow.com/a/16222108/559784"><strong>this post</strong></a> for a detailed explanation of <em>why</em> this is happening. </p> <p>Briefly, <code>x != 0</code> evaluates to "logical" and <code>NA</code> gets replaced to FALSE. However, <code>!(x==0)</code>, first <code>(x == 0)</code> gets evaluated to logical and <code>NA</code> gets replaced to FALSE. <em>Then</em> the negation happens, which results in <code>NA</code> basically becoming <code>TRUE</code>.</p> <p>So, my first (or rather main) question is, is this a bug/inconsistency? If so, I'll file it as one in data.table R-forge tracker. If not, I'd like to know the reason for this difference and I would like to suggest a correction to the documentation explaining this difference (to the already amazing documentation!). </p> <p><strong>Edit:</strong> Following up with comments, the second question is, should <code>data.table</code>'s handling for subsetting by indexing with columns containing <code>NA</code> resemble that of <code>data.frame</code>?? (But I agree, following @Roland's comment that this <em>may</em> be very well lead to opinions and I'm perfectly fine with not answering this question at all).</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload