Note that there are some explanatory texts on larger screens.

plurals
  1. POHow do you combine "Revision Control" with "Workflow" for R?
    primarykey
    data
    text
    <p>I remember coming across R users writing that they use "Revision control" (<a href="https://stackoverflow.com/questions/1056912/source-control-vs-revision-control">e.g: "Source control"</a>), and I am curious to know: How do you combine "Revision control" with your statistical analysis workflow?</p> <p>Two (very) interesting discussions talk about how to deal with the workflow. But neither of them refer to the revision control element:</p> <ul> <li><a href="https://stackoverflow.com/questions/1266279/how-to-organize-large-r-programs">How to organize large R programs?</a></li> <li><a href="https://stackoverflow.com/questions/1429907/workflow-for-statistical-analysis-and-report-writing">Workflow for statistical analysis and report writing</a></li> </ul> <p><strong>A Long Update To The Question</strong>: Following some of the people's answers, and Dirk's question in the comment, I would like to direct my question a bit more.</p> <p>After reading the Wiki article about "<a href="http://en.wikipedia.org/wiki/Revision_control" rel="nofollow noreferrer">revision control</a>" (which I was previously not familiar with), it was clear to me that when using revision control, what one does is to build a <strong>development structure</strong> of his code. This structure either leads to a "final product" or to several branches.</p> <p>When building something like, let's say, a website. There is usually one end product you work towards (the website), with some prototypes along the way.</p> <p>But when doing a statistical analysis, the work (to my view) is different. Sometimes you know where you want to get to. But more often, you explore. Explore cleaning the dataset. Explore different methods for statistical analysis, and ask various questions of your data (and I am writing this, knowing how Frank Harrell, and other experience statisticians feels about <a href="http://en.wikipedia.org/wiki/Data_dredging" rel="nofollow noreferrer">Data dredging</a>).</p> <p>That is why the workflow question with statistical programming is (in my view) a serious and deep question, raising many issues, The simpler ones are technical:</p> <ul> <li>Which revision control software do you use (and why) ?</li> <li>Which IDE do you use(and why) ? The more interesting question are about work process:</li> <li>How do you structure your files?</li> <li>What do you keep as a separate file and what as a revision? or asking in a different way - What should be a "branch" and what should be a "sub project" in your code? For example: When starting to explore your data, should a plot be creating and then erased because it didn't lead any where (but kept as a revision) or should there be a backup file of that path?</li> </ul> <p>How <strong>you</strong> solve this tension was my initial curiosity. The second question is "what might I be missing?". What rules (of thumb) should one follow so to avoid common pitfalls doing statistical programming with version control?</p> <p>In my <strong>intuition</strong>, I feel that statistical programming is inherently different then software development (I am writing this without being a real expert in statistical programming, and even less so in software development). That's way I am unsure which of the lessons I have read here about version control would be applicable.</p> <p>Thanks a lot, Tal</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload