Note that there are some explanatory texts on larger screens.

plurals
  1. POCreate Sequence Number for a block of records in an R Data Frame
    primarykey
    data
    text
    <p>I have a fairly large dataset (by my standards) and I want to create a sequence number for blocks of records. I can use the plyr package, but the execution time is very slow. The code below replicates a comparable size dataframe.</p> <pre><code>## simulate an example of the size of a normal data frame N &lt;- 30000 id &lt;- sample(1:17000, N, replace=T) term &lt;- as.character(sample(c(9:12), N, replace=T)) date &lt;- sample(seq(as.Date("2012-08-01"), Sys.Date(), by="day"), N, replace=T) char &lt;- data.frame(matrix(sample(LETTERS, N*50, replace=T), N, 50)) val &lt;- data.frame(matrix(rnorm(N*50), N, 50)) df &lt;- data.frame(id, term, date, char, val, stringsAsFactors=F) dim(df) </code></pre> <p>In reality, this is a little smaller than what I work with, as the values are typically larger...but this is close enough. </p> <p>Here is the execution time on my machine:</p> <pre><code>&gt; system.time(test.plyr &lt;- ddply(df, + .(id, term), + summarise, + seqnum = 1:length(id), + .progress="text")) |===============================================================================================| 100% user system elapsed 63.52 0.03 63.85 </code></pre> <p>Is there a "better" way to do this? Unfortunately, I am on a Windows machine.</p> <p>Thanks in advance.</p> <p>EDIT: Data.table is extremely fast, but I can't get my sequence numbers to calc correctly. Here is what my ddply version created. The majority only have one record in the group, but some have 2 rows, 3 rows, etc.</p> <pre><code>&gt; with(test.plyr, table(seqnum)) seqnum 1 2 3 4 5 24272 4950 681 88 9 </code></pre> <p>And using data.table as shown below, the same approach yields:</p> <pre><code>&gt; with(test.dt, table(V1)) V1 1 24272 </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload