Note that there are some explanatory texts on larger screens.

plurals
  1. POR - data.table by group by - order of keys and missing keys
    primarykey
    data
    text
    <p>If I have a data.table</p> <pre><code>&gt; DT1 &lt;- data.table(A=rep(c('A', 'B'), 3), B=rep(c(1,2,3), 2), val=rnorm(6), key='A,B') &gt; DT1 A B val 1: A 1 -1.6283314 2: B 2 0.5337604 3: A 3 0.9991301 4: B 1 1.1421400 5: A 2 0.1230095 6: B 3 0.4988504 </code></pre> <p>and I want to subset by more than one key, like so:</p> <pre><code>&gt; DT1[J('A', 1)] A B val 1: A 1 -0.004898047 </code></pre> <p>However, the join is dependent on the order of the keys, so the value for key A must always come first. This will not work, even if you specify names (either as <code>J()</code> or as a <code>list()</code>):</p> <pre><code>&gt; DT1[J(1, 'A')] Error in `[.data.table`(DT1, J(1, "A")) : x.'A' is a character column being joined to i.'V1' which is type 'double'. Character columns must join to factor or character columns. &gt; DT1[J(B=1, A='A')] Error in `[.data.table`(DT1, J(B = 1, A = "A")) : x.'A' is a character column being joined to i.'B' which is type 'double'. Character columns must join to factor or character columns. </code></pre> <p>Is there a syntax where you can do this kind of grouping by <code>i</code> without knowing the order of the keys?</p> <p><strong>Added:</strong> Another use case would be if I wanted to subset by B only and not by A -- is there a way to skip keys in the subsetting? The current answers that create wrapper functions for J don't seem to allow this.</p> <p><strong>EDIT:</strong> Some have mentioned doing it the data.frame way. I know that you can use a vector of logical values to subset, but this is slow as it does a scan of the entire table:</p> <pre><code>&gt; DT1 &lt;- data.table(A=rep(c(1,2,3), 100000), B=rep(c('A', 'B'), 150000), val=rnorm(300000), key='A,B') &gt; system.time(DT1[DT1$A==1, DT1$B=="A"]) user system elapsed 0.080 0.000 0.054 &gt; system.time(DT1[J(1, 'A')]) user system elapsed 0.004 0.000 0.004 </code></pre> <p>Some references to related discussions: <a href="https://stackoverflow.com/questions/8509595/how-do-i-select-rows-by-two-criteria-in-data-table-in-r">(1)</a></p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload