Note that there are some explanatory texts on larger screens.

plurals
  1. POHow Can I Optimize the Performance of My R Code?
    primarykey
    data
    text
    <p>I'm new to the R language, and I really love this language for its powerful simplicity and rich packages.</p> <p>To practice I rewrote a simple KNN prediction algorithm program in R. This program was originally written in Python. But after I wrote the R version, I found it SIGNIFICANTLY slower than the Python version, about 10 times time consuming.</p> <p>I understand R is slow because it's a interpreted language, but sill I doubt maybe I wasn't using the language properly. I was obeying some basic rules of R that I have learned so far:</p> <ol> <li>Use built-in functions as much as possible, instead of making your own.</li> <li>Use <code>sapply</code> (or other members of the apply family) wherever possible, instead of using explicit loops.</li> </ol> <p>Here's my runnable code, and functions defined should be pretty self explaining.</p> <p>Can any one give me some hints on how to optimize ?</p> <p>Update:</p> <p>I rewrote my code according to everybody's suggestion, including:</p> <ol> <li>Use a three column data frame instead of the list structure.</li> <li>I tried to vectorize as much as possible, but I don't know if I was doing right.</li> <li>I profiled my code using Rprof. </li> </ol> <p>To make this post cleaner, I put my code to ideone.com: <a href="http://ideone.com/od3ju" rel="noreferrer">http://ideone.com/od3ju</a></p> <p>But honestly there's no obvious improvement, and the code still takes about the same time to run.</p> <p>And here's the first lines of output of summaryRprof:</p> <pre><code>$by.self self.time self.pct total.time total.pct "apply" 5.18 28.68 18.06 100.00 "FUN" 5.08 28.13 18.06 100.00 "-" 1.22 6.76 1.22 6.76 "sum" 1.08 5.98 1.08 5.98 "^" 0.70 3.88 0.70 3.88 "lapply" 0.58 3.21 18.06 100.00 "[.data.frame" 0.48 2.66 1.06 5.87 "sqrt" 0.42 2.33 0.42 2.33 "data.frame" 0.26 1.44 1.60 8.86 "unlist" 0.24 1.33 0.90 4.98 "!" 0.22 1.22 0.22 1.22 "is.null" 0.22 1.22 0.22 1.22 "pmatch" 0.18 1.00 0.18 1.00 "match" 0.14 0.78 0.46 2.55 </code></pre> <p>From the output I can see that apply and its FUN are taking most of the time, and I think this makes sense since most of the work is done with in <code>apply</code>.</p> <p>So what's the next thing I should improve in my code ?</p> <p>Thanks in advance.</p> <p><strong>UPDATE:</strong></p> <p>Thanks everyone's suggestion, I've learned a lot on R and has tuned my code into a MUCH faster version: <a href="http://ideone.com/x97yQ" rel="noreferrer">http://ideone.com/x97yQ</a></p> <p>This version takes about a little more than 0.5s, which is about 50 times or more faster than my original one, and it's even faster than the Python version. So I think I should take back my words about R being a slow language and learn more about it :)</p> <p>Thanks everyone for your valuable suggestion !</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload