Note that there are some explanatory texts on larger screens.

plurals
  1. POR: Passing a data frame by reference
    text
    copied!<p>R has pass-by-value semantics, which minimizes accidental side effects (a good thing). However, when code is organized into many functions/methods for reusability/readability/maintainability and when that code needs to manipulate large data structures through, e.g., big data frames, through a series of transformations/operations the pass-by-value semantics leads to a lot of copying of data around and much heap thrashing (a bad thing). For example, a data frame that takes 50Mb on the heap that is passed as a function parameter will be copied at a minimum the same number of times as the function call depth and the heap size at the bottom of the call stack will be N*50Mb. If the functions return a transformed/modified data frame from deep in the call chain then the copying goes up by another N.</p> <p>The SO question <a href="https://stackoverflow.com/questions/596819/what-is-the-best-way-to-avoid-passing-a-data-frame-around">What is the best way to avoid passing a data frame around?</a> touches this topic but is phrased in a way that avoids directly asking the pass-by-reference question and the winning answer basically says, "yes, pass-by-value is how R works". That's not actually 100% accurate. R environments enable pass-by-reference semantics and OO frameworks such as <a href="http://code.google.com/p/r-proto/" rel="nofollow noreferrer">proto</a> use this capability extensively. For example, when a proto object is passed as a function argument, while its "magic wrapper" is passed by value, to the R developer the semantics are pass-by-reference.</p> <p>It seems that passing a big data frame by reference would be a common problem and I'm wondering how others have approached it and whether there are any libraries that enable this. In my searching I have not discovered one. </p> <p>If nothing is available, my approach would be to create a proto object that wraps a data frame. I would appreciate pointers about the syntactic sugar that should be added to this object to make it useful, e.g., overloading the $ and [[ operators, as well as any gotchas I should look out for. I'm not an R expert.</p> <p>Bonus points for a type-agnostic pass-by-reference solution that integrates nicely with R, though my needs are exclusively with data frames.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload