Note that there are some explanatory texts on larger screens.

plurals
  1. POAggregate (count) occurences of values over arbitrary timeframe
    primarykey
    data
    text
    <p>I have a CSV file with timestamps and certain event-types which happened at this time. What I want is count the number of occurences of certain event-types in 6-minutes intervals.</p> <p>The input-data looks like:</p> <pre><code>date,type "Sep 22, 2011 12:54:53.081240000","2" "Sep 22, 2011 12:54:53.083493000","2" "Sep 22, 2011 12:54:53.084025000","2" "Sep 22, 2011 12:54:53.086493000","2" </code></pre> <p>I load and cure the data with this piece of code:</p> <pre><code>&gt; raw_data &lt;- read.csv('input.csv') &gt; cured_dates &lt;- c(strptime(raw_data$date, '%b %d, %Y %H:%M:%S', tz="CEST")) &gt; cured_data &lt;- data.frame(cured_dates, c(raw_data$type)) &gt; colnames(cured_data) &lt;- c('date', 'type') </code></pre> <p>After curing the data looks like this:</p> <pre><code>&gt; head(cured_data) date type 1 2011-09-22 14:54:53 2 2 2011-09-22 14:54:53 2 3 2011-09-22 14:54:53 2 4 2011-09-22 14:54:53 2 5 2011-09-22 14:54:53 1 6 2011-09-22 14:54:53 1 </code></pre> <p>I read a lot of samples for xts and zoo, but somehow I can't get a hang on it. The output data should look something like:</p> <pre><code>date type count 2011-09-22 14:54:00 CEST 1 11 2011-09-22 14:54:00 CEST 2 19 2011-09-22 15:00:00 CEST 1 9 2011-09-22 15:00:00 CEST 2 12 2011-09-22 15:06:00 CEST 1 23 2011-09-22 15:06:00 CEST 2 18 </code></pre> <p>Zoo's aggregate function looks promising, I found this code-snippet:</p> <pre><code># aggregate POSIXct seconds data every 10 minutes tt &lt;- seq(10, 2000, 10) x &lt;- zoo(tt, structure(tt, class = c("POSIXt", "POSIXct"))) aggregate(x, time(x) - as.numeric(time(x)) %% 600, mean) </code></pre> <p>Now I'm just wondering how I could apply this on my use case.</p> <p>Naive as I am I tried:</p> <pre><code>&gt; zoo_data &lt;- zoo(cured_data$type, structure(cured_data$time, class = c("POSIXt", "POSIXct"))) &gt; aggr_data = aggregate(zoo_data$type, time(zoo_data$time), - as.numeric(time(zoo_data$time)) %% 360, count) Error in `$.zoo`(zoo_data, type) : not possible for univariate zoo series </code></pre> <p>I must admit that I'm not really confident in R, but I try. :-)</p> <p>I'm kinda lost. Could anyone point me into the right direction?</p> <p>Thanks a lot! Cheers, Alex.</p> <p>Here the output of dput for a small subset of my data. The data itself is something around 80 million rows.</p> <pre><code>structure(list(date = structure(c(1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885), class = c("POSIXct", "POSIXt"), tzone = ""), type = c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L)), .Names = c("date", "type"), row.names = c(NA, -23L), class = "data.frame") </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload