Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <h2>Just a quick sum up.</h2> <p><code>&lt;-</code> with <code>data.table</code> is just like base; i.e., no copy is taken until a subassign is done afterwards with <code>&lt;-</code> (such as changing the column names or changing an element such as <code>DT[i,j]&lt;-v</code>). Then it takes a copy of the whole object just like base. That's known as copy-on-write. Would be better known as copy-on-subassign, I think! It DOES NOT copy when you use the special <code>:=</code> operator, or the <code>set*</code> functions provided by <code>data.table</code>. If you have large data you probably want to use them instead. <code>:=</code> and <code>set*</code> will NOT COPY the <code>data.table</code>, EVEN WITHIN FUNCTIONS.</p> <p>Given this example data :</p> <pre><code>DT &lt;- data.table(a=c(1,2), b=c(11,12)) </code></pre> <p>The following just "binds" another name <code>DT2</code> to the same data object bound currently bound to the name <code>DT</code> :</p> <pre><code>DT2 &lt;- DT </code></pre> <p>This never copies, and never copies in base either. It just marks the data object so that R knows that two different names (<code>DT2</code> and <code>DT</code>) point to the same object. And so R will need to copy the object if either are <em>subassigned</em> to afterwards.</p> <p>That's perfect for <code>data.table</code>, too. The <code>:=</code> isn't for doing that. So the following is a deliberate error as <code>:=</code> isn't for just binding object names :</p> <pre><code>DT2 := DT # not what := is for, not defined, gives a nice error </code></pre> <p><code>:=</code> is for <em>subassigning</em> by reference. But you don't use it like you would in base :</p> <pre><code>DT[3,"foo"] := newvalue # not like this </code></pre> <p>you use it like this :</p> <pre><code>DT[3,foo:=newvalue] # like this </code></pre> <p>That changed <code>DT</code> by reference. Say you add a new column <code>new</code> by reference to the data object, there is no need to do this :</p> <pre><code>DT &lt;- DT[,new:=1L] </code></pre> <p>because the RHS already changed <code>DT</code> by reference. The extra <code>DT &lt;-</code> is to misunderstand what <code>:=</code> does. You can write it there, but it's superfluous. </p> <p><code>DT</code> is changed by reference, by <code>:=</code>, EVEN WITHIN FUNCTIONS :</p> <pre><code>f &lt;- function(X){ X[,new2:=2L] return("something else") } f(DT) # will change DT DT2 &lt;- DT f(DT) # will change both DT and DT2 (they're the same data object) </code></pre> <p><code>data.table</code> is for large datasets, remember. If you have a 20GB <code>data.table</code> in memory then you need a way to do this. It's a very deliberate design decision of <code>data.table</code>. </p> <p>Copies can be made, of course. You just need to tell data.table that you're sure you want to copy your 20GB dataset, by using the <code>copy()</code> function :</p> <pre><code>DT3 &lt;- copy(DT) # rather than DT3 &lt;- DT DT3[,new3:=3L] # now, this just changes DT3 because it's a copy, not DT too. </code></pre> <p>To avoid copies, don't use base type assignation or update :</p> <pre><code>DT$new4 &lt;- 1L # will make a copy so use := attr(DT,"sorted") &lt;- "a" # will make a copy use setattr() </code></pre> <p>If you want to be sure that you are updating by reference use <code>.Internal(inspect(x))</code> and look at the memory address values of the constituents (see Matthew Dowle's answer).</p> <p>Writing <code>:=</code> in <code>j</code> like that allows you subassign by reference <em>by group</em>. You can add a new column by reference by group. So that's why <code>:=</code> is done that way inside <code>[...]</code> :</p> <pre><code>DT[, newcol:=mean(x), by=group] </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload