StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<h3>See @eddi's answer for a faster solution (to this particular problem). It also works when <code>x1</code> is not an integer.</h3> <p>The algorithm you're looking for is <a href="http://en.wikipedia.org/wiki/Interval_tree" rel="nofollow"><strong>Interval Tree</strong></a>. And there's a bioconductor package called <a href="http://bioconductor.org/packages/2.12/bioc/html/IRanges.html" rel="nofollow"><strong>IRanges</strong></a> that accomplishes this task. It's hard to beat that.</p> <pre><code>require(IRanges) require(data.table) my.df[, res := countOverlaps(IRanges(my.df$x1, width=1), IRanges(my.df$x1-tol+1, my.df$x1+tol-1))] </code></pre> <hr> <h3>Some explanation:</h3> <p>If you break down the code, you can write it in three lines:</p> <pre><code>ir1 <- IRanges(my.df$x1, width=1) ir2 <- IRanges(my.df$x1-tol+1, my.df$x1+tol-1) cnt <- countOverlaps(ir1, ir2) </code></pre> <p>What we essentially do is to is to create two "ranges" (just type <code>ir1</code> and <code>ir2</code> to see how they are). Then we ask, for each entry in <code>ir1</code> how many do they overlap in <code>ir2</code> (this is the "interval tree" part). And this is very efficient. Implicitly the argument <code>type</code> to <code>countOverlaps</code>, by default is "type = any". You can explore the other types if you want. It's extremely useful. Also of relevance is <code>findOverlaps</code> function.</p> <p>Note: There can be faster solutions (in fact there is, see @eddi's) for this particular case, where width of ir1 = 1. But for problems where widths are variable and/or > 1, this should be the fastest.</p> <hr> <h3>Benchmarking:</h3> <pre><code>ag <- function(my.df) my.df[, res := sum(abs(my.df$x1-x1) < tol), by=x1] ro <- function(my.df) { my.df[,res:= { y = my.df$x1 sum(y > (x1 - tol) & y < (x1 + tol)) }, by=x1] } ar <- function(my.df) { my.df[, res := countOverlaps(IRanges(my.df$x1, width=1), IRanges(my.df$x1-tol+1, my.df$x1+tol-1))] } require(microbenchmark) microbenchmark(r1 <- ag(copy(my.df)), r2 <- ro(copy(my.df)), r3 <- ar(copy(my.df)), times=100) Unit: milliseconds expr min lq median uq max neval r1 <- ag(copy(my.df)) 33.15940 39.63531 41.61555 44.56616 208.99067 100 r2 <- ro(copy(my.df)) 69.35311 76.66642 80.23917 84.67419 344.82031 100 r3 <- ar(copy(my.df)) 11.22027 12.14113 13.21196 14.72830 48.61417 100 <~~~ identical(r1, r2) # TRUE identical(r1, r3) # TRUE </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload