Note that there are some explanatory texts on larger screens.

plurals
  1. POMemoize and vectorize a custom function
    text
    copied!<p>I want to know how to vectorize and memoize a custom function in R. It seems my way of thinking is not aligned with R's way of operation. So, I gladly welcome any links to good reading material. For example, R inferno is a nice resource, but it didn't help to figure out memoization in R.</p> <p>More generally, can you provide a relevant usage example for the <code>memoise</code> or <code>R.cache</code> packages?</p> <p>I haven't been able to find any other discussions on this subject. Searching for "memoise" or "memoize" on r-bloggers.com returns zero results. Searching for those keywords at <a href="http://r-project.markmail.org/" rel="nofollow noreferrer">http://r-project.markmail.org/</a> does not return helpful discussions. I emailed the mailing list and did not receive a complete answer.</p> <p>I am not solely interested in memoizing the GC function, and I am aware of Bioconductor and the various packages available there.</p> <p>Here's my data:</p> <pre><code>seqs &lt;- c("","G","C","CCC","T","","TTCCT","","C","CTC") </code></pre> <p>Some sequences are missing, so they're blank <code>""</code>.</p> <p>I have a function for calculating GC content:</p> <pre><code>&gt; GC &lt;- function(s) { if (!is.character(s)) return(NA) n &lt;- nchar(s) if (n == 0) return(NA) m &lt;- gregexpr('[GCSgcs]', s)[[1]] if (m[1] &lt; 1) return(0) return(100.0 * length(m) / n) } </code></pre> <p>It works:</p> <pre><code>&gt; GC('') [1] NA &gt; GC('G') [1] 100 &gt; GC('GAG') [1] 66.66667 &gt; sapply(seqs, GC) G C CCC T TTCCT NA 100.00000 100.00000 100.00000 0.00000 NA 40.00000 NA C CTC 100.00000 66.66667 </code></pre> <p>I want to memoize it. Then, I want to vectorize it.</p> <p>Apparently, I must have the wrong mindset for using the <code>memoise</code> or <code>R.cache</code> R packages:</p> <pre><code>&gt; system.time(dummy &lt;- sapply(rep(seqs,100), GC)) user system elapsed 0.044 0.000 0.054 &gt; &gt; library(memoise) &gt; GCm1 &lt;- memoise(GC) &gt; system.time(dummy &lt;- sapply(rep(seqs,100), GCm1)) user system elapsed 0.164 0.000 0.173 &gt; &gt; library(R.cache) &gt; GCm2 &lt;- addMemoization(GC) &gt; system.time(dummy &lt;- sapply(rep(seqs,100), GCm2)) user system elapsed 10.601 0.252 10.926 </code></pre> <p>Notice that the memoized functions are several orders of magnitude slower.</p> <p>I tried the <code>hash</code> package, but things seem to be happening behind the scenes and I don't understand the output. The sequence <code>C</code> should have a value of <code>100</code>, not <code>NULL</code>.</p> <p>Note that using <code>has.key(s, cache)</code> instead of <code>exists(s, cache)</code> results in the same output. Also, using <code>cache[s] &lt;&lt;- result</code> instead of <code>cache[[s]] &lt;&lt;- result</code> results in the same output.</p> <pre><code>&gt; cache &lt;- hash() &gt; GCc &lt;- function(s) { if (!is.character(s) || nchar(s) == 0) { return(NA) } if(exists(s, cache)) { return(cache[[s]]) } result &lt;- GC(s) cache[[s]] &lt;&lt;- result return(result) } &gt; sapply(seqs,GCc) [[1]] [1] NA $G [1] 100 $C NULL $CCC [1] 100 $T NULL [[6]] [1] NA $TTCCT [1] 40 [[8]] [1] NA $C NULL $CTC [1] 66.66667 </code></pre> <p>At least I figured out how to vectorize:</p> <pre><code>&gt; GCv &lt;- Vectorize(GC) &gt; GCv(seqs) G C CCC T TTCCT NA 100.00000 100.00000 100.00000 0.00000 NA 40.00000 NA C CTC 100.00000 66.66667 </code></pre> <p><strong>Relevant stackoverflow posts:</strong></p> <ul> <li><a href="https://stackoverflow.com/questions/7262485/options-for-caching-memoization-hashing-in-r">Options for caching / memoization / hashing in R</a></li> </ul>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload