Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to do a basic left outer join with data.table in R?
    text
    copied!<p>I have a data.table of a and b that I've partitioned into <code>below</code> with b &lt; .5 and <code>above</code> with b > .5:</p> <pre><code>DT = data.table(a=as.integer(c(1,1,2,2,3,3)), b=c(0,0,0,1,1,1)) above = DT[DT$b &gt; .5] below = DT[DT$b &lt; .5, list(a=a)] </code></pre> <p>I'd like to do a left outer join between <code>above</code> and <code>below</code>: for each <code>a</code> in <code>above</code>, count the number of rows in <code>below</code>. This is equivalent to the following in SQL:</p> <pre><code>with dt as (select 1 as a, 0 as b union select 1, 0 union select 2, 0 union select 2, 1 union select 3, 1 union select 3, 1), above as (select a, b from dt where b &gt; .5), below as (select a, b from dt where b &lt; .5) select above.a, count(below.a) from above left outer join below on (above.a = below.a) group by above.a; a | count ---+------- 3 | 0 2 | 1 (2 rows) </code></pre> <p>How do I accomplish the same thing with data.tables? This is what I tried so far:</p> <pre><code>&gt; key(below) = 'a' &gt; below[above, list(count=length(b))] a count [1,] 2 1 [2,] 3 1 [3,] 3 1 &gt; below[above, list(count=length(b)), by=a] Error in eval(expr, envir, enclos) : object 'b' not found &gt; below[, list(count=length(a)), by=a][above] a count b [1,] 2 1 1 [2,] 3 NA 1 [3,] 3 NA 1 </code></pre> <p>I should also be more specific in that I already tried <code>merge</code> but that blows through the memory on my system (and the dataset takes only about 20% of my memory).</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload