Note that there are some explanatory texts on larger screens.

plurals
  1. POConditional stat_summary for ggplot in R
    primarykey
    data
    text
    <p>I'd like to write some conditional stats in my graph if the data is bigger than a certain value.</p> <p>With the kind help of Jack Ryan (<a href="https://stackoverflow.com/questions/17066129/cut-data-and-access-groups-to-draw-percentile-lines">Cut data and access groups to draw percentile lines</a>), I could create the following script that groups data into hours and plots the result:</p> <pre><code># Read example data A &lt;- read.csv(url('http://people.ee.ethz.ch/~hoferr/download/data-20130812.csv')) # Libraries library(doBy) library(ggplot2) library(plyr) library(reshape2) library(MASS) library(scales) # Sample size function give.n &lt;- function(x){ return(c(y = min(x) - 0.2, label = length(x))) } # Calculate gaps gaps &lt;- rep(NA, length(A$Timestamp)) times &lt;- A$Timestamp loss &lt;- A$pingLoss gap.start &lt;- 1 gap.end &lt;- 1 for(i in 2:length(A$Timestamp)) { #For all rows if(is.na(A$pingRTT.ms.[i])) { #Currently no connection if(!is.na(A$pingRTT.ms.[i-1])) { #Connection lost now gap.start &lt;- i } if(!is.na(A$pingRTT.ms.[i+1])) { # Connection restores next time gap.end &lt;- i+1 gaps[gap.start] &lt;- as.numeric(A$Timestamp[gap.end]-A$Timestamp[gap.start], units="secs") loss[gap.start] &lt;- gap.end - gap.start } } } H &lt;- data.frame(times, gaps, loss) H &lt;- H[complete.cases(H),] C &lt;- H C$dates &lt;- strptime(C$times, "%Y-%m-%d %H:%M:%S") C$h1 &lt;- C$dates$hour # Calculate percentiles cuts &lt;- c(1, .75, .5, .25, 0) c &lt;- ddply(C, .(h1), function (x) { summarise(x, y = quantile(x$gaps, cuts)) } ) c$cuts &lt;- cuts c &lt;- dcast(c, h1 ~ cuts, value.var = "y") c.melt &lt;- melt(c, id.vars = "h1") p &lt;- ggplot(c.h1.melt, aes(x = h1, y = value, color = variable)) + geom_point(size = 4) + stat_summary(fun.data = max.n, geom = "text", fun.y = max, colour = "red", angle = 90, size=4) + scale_colour_brewer(palette="RdYlBu", name="Percentile", guide = guide_legend(reverse=TRUE)) + scale_x_continuous(breaks=0:23, limits = c(0,23)) + annotation_logticks(sides = "lr") + theme_bw() + scale_y_log10(breaks=c(1e0,1e1,1e2,1e3,1e4), labels = trans_format("log10", math_format(10^.x)), limits=c(1e0,1e4)) + xlab("Hour of day") + ylab("Ping gaps [s]") p p &lt;- ggplot(c.m1.melt, aes(x = m1/60, y = value, color = variable)) + geom_point(size = 1) + stat_summary(fun.data = give.n, geom = "text", fun.y = median, angle = 90, size=4) + stat_summary(fun.data = max.n, geom = "text", fun.y = max, colour = "red", angle = 90, size=4) + scale_colour_brewer(palette="RdYlBu", name="Percentile", guide = guide_legend(reverse=TRUE)) + scale_x_continuous(breaks=0:23, limits = c(0,24)) + annotation_logticks(sides = "lr") + theme_bw() + scale_y_log10(breaks=c(1e0,1e1,1e2,1e3,1e4), labels = trans_format("log10", math_format(10^.x)), limits=c(1e0,1e4)) + xlab("Time of day") + ylab("Ping gaps [s]") p </code></pre> <p>This creates an hourly grouped plot of gaps with the length of the longest gaps written right next to the data points:</p> <p><img src="https://i.stack.imgur.com/6oS0i.jpg" alt="Hourly grouped plot without sample number per group"></p> <p>Below is the minutely grouped plot. The number are unreadable why I'd like to add conditional stats if the gap is longer than 5 minutes or only for the ten longest gaps or something like this.</p> <p><img src="https://i.stack.imgur.com/PA288.jpg" alt="Minutely grouped plot with unreadable stats"></p> <p>I tried to just change the stat function to</p> <pre><code>max.n.filt &lt;- function(x){ filter = 300 if ( x &gt; filter ) { return(c(y = max(x) + 0.4, label = round(max(10^x),2))) } else { return(c(y=x, label = "")) } } </code></pre> <p>and use this for the minutely grouped plot. But I got this error:</p> <pre><code>Error in list_to_dataframe(res, attr(.data, "split_labels")) : Results do not have equal lengths In addition: There were 50 or more warnings (use warnings() to see the first 50) Error in if (nrow(layer_data) == 0) return() : argument is of length zero Calls: print ... print.ggplot -&gt; ggplot_gtable -&gt; Map -&gt; mapply -&gt; &lt;Anonymous&gt; In addition: Warning message: Removed 6 rows containing missing values (geom_point). </code></pre> <p>In addition, in the hourly plot, I'd like to write the number of samples per hour right next to the length of the gaps. I think I can add a new column to the c data frame, but unfortunately I can't find a way to do this.</p> <p>Any help is very much appreciated.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload