StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POR optimization: How can I avoid a for loop in this situation?
text
Body
copied!<p>I'm trying to do a simple genomic track intersection in R, and running into major performance problems, probably related to my use of for loops. </p> <p>In this situation, I have pre-defined windows at intervals of 100bp and I'm trying to calculate how much of each window is covered by the annotations in mylist. Graphically, it looks something like this:</p> <pre><code> 0 100 200 300 400 500 600 windows: |-----|-----|-----|-----|-----|-----| mylist: |-| |-----------| </code></pre> <p>So I wrote some code to do just that, but it's fairly slow and has become a bottleneck in my code:</p> <pre><code>##window for each 100-bp segment windows <- numeric(6) ##second track mylist = vector("list") mylist[[1]] = c(1,20) mylist[[2]] = c(120,320) ##do the intersection for(i in 1:length(mylist)){ st <- floor(mylist[[i]][1]/100)+1 sp <- floor(mylist[[i]][2]/100)+1 for(j in st:sp){ b <- max((j-1)*100, mylist[[i]][1]) e <- min(j*100, mylist[[i]][2]) windows[j] <- windows[j] + e - b + 1 } } print(windows) [1] 20 81 101 21 0 0 </code></pre> <p>Naturally, this is being used on data sets that are much larger than the example I provide here. Through some profiling, I can see that the bottleneck is in the for loops, but my clumsy attempt to vectorize it using *apply functions resulted in code that runs an order of magnitude more slowly.</p> <p>I suppose I could write something in C, but I'd like to avoid that if possible. Can anyone suggest another approach that will speed this calculation up? </p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload