Note that there are some explanatory texts on larger screens.

plurals
  1. POR grouping by condition in data.table
    primarykey
    data
    text
    <p>In R, I have a large data.table. For every row, I want to count rows with a similar value of x1 (+/- some tolerance, tol). I can get this to work using adply, but it's too slow. It seems like the sort of thing data.table would be good for - in fact, I'm already using data.table for part of the computation.</p> <p>Is there a way to do this entirely with data.table? Here is an example:</p> <pre><code>library(data.table) library(plyr) my.df = data.table(x1 = 1:1000, x2 = 4:1003) tol = 3 adply(my.df, 1, function(df) my.df[x1 &gt; (df$x1 - tol) &amp; x1 &lt; (df$x1 + tol), .N]) </code></pre> <p>Results:</p> <pre><code> x1 x2 V1 1: 1 4 3 2: 2 5 4 3: 3 6 5 4: 4 7 5 5: 5 8 5 --- 996: 996 999 5 997: 997 1000 5 998: 998 1001 5 999: 999 1002 4 1000: 1000 1003 3 </code></pre> <h2>Update:</h2> <p>Here's a sample dataset that is a little closer to my real data:</p> <pre><code>set.seed(10) x = seq(1,100000000,100000) x = x + sample(1:50000, length(x), replace=T) x2 = x + sample(1:50000, length(x), replace=T) my.df = data.table(x1 = x, x2 = x2) setkey(my.df,x1) tol = 100000 og = function(my.df) { adply(my.df, 1, function(df) my.df[x1 &gt; (df$x1 - tol) &amp; x1 &lt; (df$x1 + tol), .N]) } microbenchmark(r_ed &lt;- ed(copy(my.df)), r_ar &lt;- ar(copy(my.df)), r_og &lt;- og(copy(my.df)), times = 1) Unit: milliseconds expr min lq median uq max neval r_ed &lt;- ed(copy(my.df)) 8.553137 8.553137 8.553137 8.553137 8.553137 1 r_ar &lt;- ar(copy(my.df)) 10.229438 10.229438 10.229438 10.229438 10.229438 1 r_og &lt;- og(copy(my.df)) 1424.472844 1424.472844 1424.472844 1424.472844 1424.472844 1 </code></pre> <p>Obviously, solutions from both @eddi and @Arun are much faster than mine. Now I just have to try to understand rolls.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload