StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Here is an example along the lines of the data you have, as loaded into R, aggregated etc...</p> <p>First, some dummy data to write out to a file:</p> <pre><code>stime <- as.POSIXct("2011-01-01-00:00:00", format = "%Y-%d-%m-%H:%M:%S") ## dummy data dat <- data.frame(Timestamp = seq(from = stime, by = 5, length = 2000000), DD1 = sample(1:1000, replace = TRUE), DD2 = sample(1:1000, replace = TRUE), DD3 = sample(1:1000, replace = TRUE), DD4 = sample(1:1000, replace = TRUE)) ## write it out write.csv(dat, file = "timestamp_data.txt", row.names = FALSE) </code></pre> <p>Then we can time reading in the 2-million rows. To speed this up, we tell R the classes of the columns in the file: <code>"POSIXct"</code> is one way in R to store the sort of timestamps you have.</p> <pre><code>## read it in: system.time({ tsdat <- read.csv("timestamp_data.txt", header = TRUE, colClasses = c("POSIXct",rep("integer", 4))) }) </code></pre> <p>which, takes about 13 seconds to read in and format in internal unix times on my modest laptop.</p> <pre><code> user system elapsed 13.698 5.827 19.643 </code></pre> <p>Aggregation can be done in lots of ways, one is using <code>aggregate()</code>. Say aggregate to the hour mean/average:</p> <pre><code>## Generate some indexes that we'll use the aggregate over tsdat <- transform(tsdat, hours = factor(strftime(tsdat$Timestamp, format = "%H")), jday = factor(strftime(tsdat$Timestamp, format = "%j"))) ## compute the mean of the 4 variables for each minute out <- aggregate(cbind(Timestamp, DD1, DD2, DD3, DD4) ~ hours + jday, data = tsdat, FUN = mean) ## convert average Timestamp to a POSIX time out <- transform(out, Timestamp = as.POSIXct(Timestamp, origin = ISOdatetime(1970,1,1,0,0,0))) </code></pre> <p>That (the line creating <code>out</code>) takes ~16 seconds on my laptop, and gives the following output:</p> <pre><code>> head(out) hours jday Timestamp DD1 DD2 DD3 DD4 1 00 001 2010-12-31 23:29:57 500.2125 491.4333 510.7181 500.4833 2 01 001 2011-01-01 00:29:57 516.0472 506.1264 519.0931 494.2847 3 02 001 2011-01-01 01:29:57 507.5653 499.4972 498.9653 509.1389 4 03 001 2011-01-01 02:29:57 520.4111 500.8708 514.1514 491.0236 5 04 001 2011-01-01 03:29:57 498.3222 500.9139 513.3194 502.6514 6 05 001 2011-01-01 04:29:57 515.5792 497.1194 510.2431 496.8056 </code></pre> <p>Simple plotting can be achieved using the <code>plot()</code> function:</p> <pre><code>plot(DD1 ~ Timestamp, data = out, type = "l") </code></pre> <p>We can overlay more variables via, e.g.:</p> <pre><code>ylim <- with(out, range(DD1, DD2)) plot(DD1 ~ Timestamp, data = out, type = "l", ylim = ylim) lines(DD2 ~ Timestamp, data = out, type = "l", col = "red") </code></pre> <p>or via multiple panels:</p> <pre><code>layout(1:2) plot(DD1 ~ Timestamp, data = out, type = "l", col = "blue") plot(DD2 ~ Timestamp, data = out, type = "l", col = "red") layout(1) </code></pre> <p>This has all been done with base R functionality. Others have shown how add-on packages can make working with dates easier.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload