Note that there are some explanatory texts on larger screens.

plurals
  1. POWriting robust R code: namespaces, masking and using the `::` operator
    primarykey
    data
    text
    <h2>Short version</h2> <p>For those that don't want to read through my "case", this is the essence:</p> <ol> <li>What is the recommended way of minimizing the chances of new packages breaking existing code, i.e. of making the code you write <strong>as robust as possible</strong>?</li> <li><p>What is the recommended way of making the best use of the <strong>namespace mechanism</strong> when </p> <p>a) just <em>using</em> contributed packages (say in just some R Analysis Project)?</p> <p>b) with respect to <em>developing</em> own packages?</p></li> <li><p>How best to avoid conflicts with respect to <strong>formal classes</strong> (mostly <a href="http://stat.ethz.ch/R-manual/R-devel/library/methods/html/refClass.html" rel="noreferrer">Reference Classes</a> in my case) as there isn't even a namespace mechanism comparable to <code>::</code> for classes (AFAIU)?</p></li> </ol> <hr> <h2>The way the R universe works</h2> <p>This is something that's been nagging in the back of my mind for about two years now, yet I don't feel as if I have come to a satisfying solution. Plus I feel it's getting worse.</p> <p>We see an ever increasing number of packages on <a href="http://cran.at.r-project.org/web/packages/available_packages_by_name.html" rel="noreferrer">CRAN</a>, <a href="https://github.com/" rel="noreferrer">github</a>, <a href="http://r-forge.r-project.org/" rel="noreferrer">R-Forge</a> and the like, which is simply terrific.</p> <p>In such a decentralized environment, it is natural that the code base that makes up R (let's say that's <em>base R</em> and <em>contributed R</em>, for simplicity) will deviate from an ideal state with respect to robustness: people follow different conventions, there's S3, S4, S4 Reference Classes, etc. Things can't be as "aligned" as they would be if there were a "<em>central clearing instance</em>" that enforced conventions. That's okay.</p> <h2>The problem</h2> <p>Given the above, it can be very hard to use R to write robust code. Not everything you need will be in base R. For certain projects you will end up loading quite a few contributed packages.</p> <p><strong>IMHO, the biggest issue in that respect is the way the namespace concept is put to use in R: R allows for simply writing the name of a certain function/method without explicitly requiring it's namespace (i.e. <code>foo</code> vs. <code>namespace::foo</code>)</strong>. </p> <p>So for the sake of simplicity, that's what everyone is doing. But that way, name clashes, broken code and the need to rewrite/refactor your code are just a matter of time (or of the number of different packages loaded). </p> <p>At best, you will <strong>know</strong> about which existing functions are masked/overloaded by a newly added package. At worst, you will have no clue until your code breaks.</p> <p>A couple of examples: </p> <ul> <li>try loading <a href="http://cran.at.r-project.org/web/packages/RMySQL/index.html" rel="noreferrer">RMySQL</a> and <a href="http://cran.at.r-project.org/web/packages/RSQLite/index.html" rel="noreferrer">RSQLite</a> at the same time, they don't go along very well</li> <li>also <a href="http://cran.at.r-project.org/web/packages/RMongo/index.html" rel="noreferrer">RMongo</a> will overwrite certain functions of <a href="http://cran.at.r-project.org/web/packages/RMySQL/index.html" rel="noreferrer">RMySQL</a></li> <li><a href="http://cran.at.r-project.org/web/packages/forecast/index.html" rel="noreferrer">forecast</a> masks a lot of stuff with respect to ARIMA-related functions</li> <li><a href="http://cran.at.r-project.org/web/packages/R.utils/index.html" rel="noreferrer">R.utils</a> even masks the <code>base::parse</code> routine </li> </ul> <p>(I can't recall which functions in particular were causing the problems, but am willing to look it up again if there's interest)</p> <p>Surprisingly, this doesn't seem to bother a lot of programmers out there. I tried to raise interest a couple of times at <a href="http://tolstoy.newcastle.edu.au/R/e15/devel/11/08/0416.html" rel="noreferrer">r-devel</a>, to no significant avail.</p> <h2>Downsides of using the <code>::</code> operator</h2> <ol> <li>Using the <code>::</code> operator might significantly hurt efficiency in certain contexts as Dominick Samperi <a href="http://tolstoy.newcastle.edu.au/R/e15/devel/11/10/0716.html" rel="noreferrer">pointed out</a>.</li> <li>When <strong>developing</strong> your own package, you can't even use the <code>::</code> operator throughout your own code as your code is no real package yet and thus there's also no namespace yet. So I would have to initially stick to the <code>foo</code> way, build, test and then go back to changing everything to <code>namespace::foo</code>. Not really.</li> </ol> <h2>Possible solutions to avoid these problems</h2> <ol> <li><strong>Reassign</strong> each function from each package to a variable that follows certain naming conventions, e.g. <code>namespace..foo</code> in order to avoid the inefficiencies associated with <code>namespace::foo</code> (I outlined it once <a href="http://tolstoy.newcastle.edu.au/R/e15/devel/11/08/0416.html" rel="noreferrer">here</a>). Pros: it works. Cons: it's clumsy and you double the memory used.</li> <li><strong>Simulate</strong> a namespace when developing your package. AFAIU, this is not really possible, at least I was <a href="http://tolstoy.newcastle.edu.au/R/e13/devel/11/01/0039.html" rel="noreferrer">told so back then</a>.</li> <li>Make it <strong>mandatory</strong> to use <code>namespace::foo</code>. IMHO, that would be the best thing to do. Sure, we would lose some extend of simplicity, but then again the R universe just isn't simple anymore (at least it's not as simple as in the early 00's).</li> </ol> <h2>And what about (formal) classes?</h2> <p>Apart from the aspects described above, <code>::</code> way works quite well for functions/methods. But what about class definitions?</p> <p>Take package <a href="http://cran.at.r-project.org/web/packages/timeDate/index.html" rel="noreferrer">timeDate</a> with it's class <code>timeDate</code>. Say another package comes along which also has a class <code>timeDate</code>. I don't see how I could explicitly state that I would like a new instance of class <code>timeDate</code> from either of the two packages. </p> <p>Something like this will not work:</p> <pre><code>new(timeDate::timeDate) new("timeDate::timeDate") new("timeDate", ns="timeDate") </code></pre> <p>That can be a huge problem as more and more people switch to an OOP-style for their R packages, leading to lots of class definitions. If there <strong>is</strong> a way to explicitly address the namespace of a class definition, I would very much appreciate a pointer!</p> <h2>Conclusion</h2> <p>Even though this was a bit lengthy, I hope I was able to point out the core problem/question and that I can raise more awareness here.</p> <p>I think <a href="http://cran.at.r-project.org/web/packages/devtools/index.html" rel="noreferrer">devtools</a> and <a href="http://cran.at.r-project.org/web/packages/mvbutils/index.html" rel="noreferrer">mvbutils</a> do have some approaches that might be worth spreading, but I'm sure there's more to say.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload