Note that there are some explanatory texts on larger screens.

plurals
  1. POOptions for caching / memoization / hashing in R
    primarykey
    data
    text
    <p>I am trying to find a simple way to use something like Perl's hash functions in R (essentially caching), as I intended to do both Perl-style hashing and write my own memoisation of calculations. However, others have beaten me to the punch and have packages for memoisation. The more I dig, the more I find, e.g.<code>memoise</code> and <code>R.cache</code>, but differences aren't readily clear. In addition, it's not clear how else one can get Perl-style hashes (or Python-style dictionaries) and write one's own memoization, other than to use the <code>hash</code> package, which doesn't seem to underpin the two memoization packages.</p> <p>Since I can find no information on CRAN or elsewhere to distinguish between the options, perhaps this should be a community wiki question on SO: What are the options for memoization and caching in R, and what are their differences?</p> <hr> <p>As a basis for comparison, here is a list of the options I've found. Also, it seems to me that all depend on hashing, so I'll note the hashing options as well. Key/value storage is somewhat related, but opens a huge can of worms regarding DB systems (e.g. BerkeleyDB, Redis, MemcacheDB and <a href="http://en.wikipedia.org/wiki/Key-value_store" rel="noreferrer">scores of others</a>).</p> <p>It looks like the options are:</p> <h3>Hashing</h3> <ul> <li><a href="http://cran.r-project.org/web/packages/digest/index.html" rel="noreferrer">digest</a> - provides hashing for arbitrary R objects.</li> </ul> <h3>Memoization</h3> <ul> <li><a href="http://cran.r-project.org/web/packages/memoise/index.html" rel="noreferrer">memoise</a> - a very simple tool for memoization of functions.</li> <li><a href="http://cran.r-project.org/web/packages/R.cache/index.html" rel="noreferrer">R.cache</a> - offers more functionality for memoization, though it seems some of the functions lack examples.</li> </ul> <h3>Caching</h3> <ul> <li><a href="http://cran.r-project.org/web/packages/hash/" rel="noreferrer">hash</a> - Provides caching functionality akin to Perl's hashes and Python dictionaries.</li> </ul> <h3>Key/value storage</h3> <p>These are basic options for external storage of R objects.</p> <ul> <li><a href="http://cran.r-project.org/web/packages/stashR/index.html" rel="noreferrer">stashr</a></li> <li><a href="http://cran.r-project.org/web/packages/filehash/index.html" rel="noreferrer">filehash</a></li> </ul> <h3>Checkpointing</h3> <ul> <li><a href="http://cran.r-project.org/web/packages/cacher/index.html" rel="noreferrer">cacher</a> - this seems to be more akin to <a href="http://en.wikipedia.org/wiki/Application_checkpointing" rel="noreferrer">checkpointing</a>.</li> <li><a href="http://www.omegahat.org/CodeDepends/" rel="noreferrer">CodeDepends</a> - An OmegaHat project that underpins <code>cacher</code> and provides some useful functionality.</li> <li><a href="http://dmtcp.sourceforge.net/" rel="noreferrer">DMTCP</a> (not an R package) - appears to support checkpointing in a bunch of languages, and <a href="http://r.789695.n4.nabble.com/DMTCP-checkpoint-restart-for-R-td3735097.html" rel="noreferrer">a developer recently sought assistance testing DMTCP checkpointing in R</a>.</li> </ul> <h3>Other</h3> <ul> <li>Base R supports: named vectors and lists, row and column names of data frames, and names of items in environments. It seems to me that using a list is a bit of a kludge. (There's also <code>pairlist</code>, but <a href="http://cran.r-project.org/doc/manuals/R-lang.html#Pairlist-objects" rel="noreferrer">it is deprecated</a>.)</li> <li>The <a href="http://datatable.r-forge.r-project.org/" rel="noreferrer">data.table</a> package supports rapid lookups of elements in a data table.</li> </ul> <hr> <h3>Use case</h3> <p>Although I'm mostly interested in knowing the options, I have two basic use cases that arise:</p> <ol> <li>Caching: Simple counting of strings. [Note: This isn't for NLP, but general use, so NLP libraries are overkill; tables are inadequate because I prefer not to wait until the entire set of strings are loaded into memory. Perl-style hashes are at the right level of utility.]</li> <li>Memoization of monstrous calculations.</li> </ol> <p>These really arise because I'm <a href="https://stackoverflow.com/questions/7252602/digging-into-r-profiling-information">digging in to the profiling of some slooooow code</a> and I'd really like to just count simple strings and see if I can speed up some calculations via memoization. Being able to hash the input values, even if I don't memoize, would let me see if memoization can help.</p> <hr> <p>Note 1: The <a href="http://cran.r-project.org/web/views/ReproducibleResearch.html" rel="noreferrer">CRAN Task View on Reproducible Research</a> lists a couple of the packages (<code>cacher</code> and <code>R.cache</code>), but there is no elaboration on usage options.</p> <p>Note 2: To aid others looking for related code, here a few notes on some of the authors or packages. Some of the authors use SO. :)</p> <ul> <li>Dirk Eddelbuettel: <code>digest</code> - a lot of other packages depend on this.</li> <li>Roger Peng: <code>cacher</code>, <code>filehash</code>, <code>stashR</code> - these address different problems in different ways; see <a href="http://www.biostat.jhsph.edu/~rpeng/software/index.html" rel="noreferrer">Roger's site</a> for more packages.</li> <li>Christopher Brown: <code>hash</code> - Seems to be a useful package, but the links to ODG are down, unfortunately.</li> <li>Henrik Bengtsson: <code>R.cache</code> &amp; Hadley Wickham: <code>memoise</code> -- it's not yet clear when to prefer one package over the other.</li> </ul> <p>Note 3: Some people use memoise/memoisation others use memoize/memoization. Just a note if you're searching around. Henrik uses "z" and Hadley uses "s".</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload