Note that there are some explanatory texts on larger screens.

plurals
  1. POIs my way of duplicating rows in data.table efficient?
    text
    copied!<p>I have monthly data in one <code>data.table</code> and annual data in another <code>data.table</code> and now I want to match the annual data to the respective observation in the monthly data.</p> <p>My approach is as follows: Duplicating the annual data for every month and then join the monthly and annual data. And now I have a question regarding the duplication of rows. I know how to do it, but I'm not sure if it is the best way to do it, so some opinions would be great.</p> <p>Here is an exemplatory <code>data.table DT</code> for my annual data and how I currently duplicate:</p> <pre><code>library(data.table) DT &lt;- data.table(ID = paste(rep(c("a", "b"), each=3), c(1:3, 1:3), sep="_"), values = 10:15, startMonth = seq(from=1, by=2, length=6), endMonth = seq(from=3, by=3, length=6)) DT ID values startMonth endMonth [1,] a_1 10 1 3 [2,] a_2 11 3 6 [3,] a_3 12 5 9 [4,] b_1 13 7 12 [5,] b_2 14 9 15 [6,] b_3 15 11 18 #1. Alternative DT1 &lt;- DT[, list(MONTH=startMonth:endMonth), by="ID"] setkey(DT, ID) setkey(DT1, ID) DT1[DT] ID MONTH values startMonth endMonth a_1 1 10 1 3 a_1 2 10 1 3 a_1 3 10 1 3 a_2 3 11 3 6 [...] </code></pre> <p>The last join is exactly what I want. However, <code>DT[, list(MONTH=startMonth:endMonth), by="ID"]</code> already does everything I want except adding the other columns to <code>DT</code>, so I was wondering if I could get rid of the last three rows in my code, i.e. the <code>setkey</code> and <code>join</code> operations. It turns out, you can, just do the following:</p> <pre><code>#2. Alternative: More intuitiv and just one line of code DT[, list(MONTH=startMonth:endMonth, values, startMonth, endMonth), by="ID"] ID MONTH values startMonth endMonth a_1 1 10 1 3 a_1 2 10 1 3 a_1 3 10 1 3 a_2 3 11 3 6 ... </code></pre> <p>This, however, only works because I hardcoded the column names into the <code>list</code> expression. In my real data, I do not know the names of all columns in advance, so I was wondering if I could just tell <code>data.table</code> to return the column <code>MONTH</code> that I compute as shown above and all the other columns of <code>DT</code>. <code>.SD</code> seemed to be able to do the trick, but:</p> <pre><code>DT[, list(MONTH=startMonth:endMonth, .SD), by="ID"] Error in `[.data.table`(DT, , list(YEAR = startMonth:endMonth, .SD), by = "ID") : maxn (4) is not exact multiple of this j column's length (3) </code></pre> <p>So to summarize, I know how it's been done, but I was just wondering if this is the best way to do it because I'm still struggling a little bit with the syntax of <code>data.table</code> and often read in posts and on the wiki that there are good and bads ways of doing things. Also, I don't quite get why I get an error when using <code>.SD</code>. I thought it is just any easy way to tell <code>data.table</code> that you want all columns. What do I miss?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload