Note that there are some explanatory texts on larger screens.

plurals
  1. POUnderstanding dates and plotting a histogram with ggplot2 in R
    primarykey
    data
    text
    <h2>Main Question</h2> <p>I'm having issues with understanding why the handling of dates, labels and breaks is not working as I would have expected in R when trying to make a histogram with ggplot2.</p> <p><strong>I'm looking for:</strong></p> <ul> <li>A histogram of the frequency of my dates</li> <li>Tick marks centered under the matching bars</li> <li>Date labels in <code>%Y-b</code> format</li> <li>Appropriate limits; minimized empty space between edge of grid space and outermost bars</li> </ul> <p>I've <a href="http://pastebin.com/sDzXKFxJ" rel="nofollow noreferrer">uploaded my data to pastebin</a> to make this reproducible. I've created several columns as I wasn't sure the best way to do this:</p> <pre><code>&gt; dates &lt;- read.csv("http://pastebin.com/raw.php?i=sDzXKFxJ", sep=",", header=T) &gt; head(dates) YM Date Year Month 1 2008-Apr 2008-04-01 2008 4 2 2009-Apr 2009-04-01 2009 4 3 2009-Apr 2009-04-01 2009 4 4 2009-Apr 2009-04-01 2009 4 5 2009-Apr 2009-04-01 2009 4 6 2009-Apr 2009-04-01 2009 4 </code></pre> <p>Here's what I tried:</p> <pre><code>library(ggplot2) library(scales) dates$converted &lt;- as.Date(dates$Date, format="%Y-%m-%d") ggplot(dates, aes(x=converted)) + geom_histogram() + opts(axis.text.x = theme_text(angle=90)) </code></pre> <p>Which yields <a href="https://i.imgur.com/rks0y.png" rel="nofollow noreferrer">this graph</a>. I wanted <code>%Y-%b</code> formatting, though, so I hunted around and tried the following, based on <a href="https://stackoverflow.com/questions/10576095/formatting-dates-with-scale-x-date-in-ggplot2">this SO</a>:</p> <pre><code>ggplot(dates, aes(x=converted)) + geom_histogram() + scale_x_date(labels=date_format("%Y-%b"), + breaks = "1 month") + opts(axis.text.x = theme_text(angle=90)) stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this. </code></pre> <p>That gives me <a href="https://i.stack.imgur.com/HaGsV.png" rel="nofollow noreferrer">this graph</a></p> <ul> <li>Correct x axis label format</li> <li>The frequency distribution has changed shape (binwidth issue?)</li> <li>Tick marks don't appear centered under bars</li> <li>The xlims have changed as well</li> </ul> <p>I worked through the example in the <a href="http://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf" rel="nofollow noreferrer">ggplot2 documentation</a> at the <code>scale_x_date</code> section and <code>geom_line()</code> appears to break, label, and center ticks correctly when I use it with my same x-axis data. I don't understand why the histogram is different.</p> <hr> <h2>Updates based on answers from edgester and gauden</h2> <p>I initially thought gauden's answer helped me solve my problem, but am now puzzled after looking more closely. Note the differences between the two answers' resulting graphs after the code.</p> <p>Assume for both:</p> <pre><code>library(ggplot2) library(scales) dates &lt;- read.csv("http://pastebin.com/raw.php?i=sDzXKFxJ", sep=",", header=T) </code></pre> <p>Based on @edgester's answer below, I was able to do the following:</p> <pre><code>freqs &lt;- aggregate(dates$Date, by=list(dates$Date), FUN=length) freqs$names &lt;- as.Date(freqs$Group.1, format="%Y-%m-%d") ggplot(freqs, aes(x=names, y=x)) + geom_bar(stat="identity") + scale_x_date(breaks="1 month", labels=date_format("%Y-%b"), limits=c(as.Date("2008-04-30"),as.Date("2012-04-01"))) + ylab("Frequency") + xlab("Year and Month") + theme_bw() + opts(axis.text.x = theme_text(angle=90)) </code></pre> <p>Here is my attempt based on gauden's answer:</p> <pre><code>dates$Date &lt;- as.Date(dates$Date) ggplot(dates, aes(x=Date)) + geom_histogram(binwidth=30, colour="white") + scale_x_date(labels = date_format("%Y-%b"), breaks = seq(min(dates$Date)-5, max(dates$Date)+5, 30), limits = c(as.Date("2008-05-01"), as.Date("2012-04-01"))) + ylab("Frequency") + xlab("Year and Month") + theme_bw() + opts(axis.text.x = theme_text(angle=90)) </code></pre> <p>Plot based on edgester's approach:</p> <p><img src="https://i.stack.imgur.com/SQB95.png" alt="edgester-plot"></p> <p>Plot based on gauden's approach:</p> <p><img src="https://i.stack.imgur.com/qvXN5.png" alt="gauden-plot"></p> <p>Note the following:</p> <ul> <li>gaps in gauden's plot for 2009-Dec and 2010-Mar; <code>table(dates$Date)</code> reveals that there are 19 instances of <code>2009-12-01</code> and 26 instances of <code>2010-03-01</code> in the data</li> <li>edgester's plot starts at 2008-Apr and ends at 2012-May. This is correct based on a minimum value in the data of 2008-04-01 and a max date of 2012-05-01. For some reason gauden's plot starts in 2008-Mar and still somehow manages to end at 2012-May. After counting bins and reading along the month labels, for the life of me I can't figure out which plot has an extra or is missing a bin of the histogram!</li> </ul> <p>Any thoughts on the differences here? edgester's method of creating a separate count</p> <hr> <h2>Related References</h2> <p>As an aside, here are other locations that have information about dates and ggplot2 for passers-by looking for help:</p> <ul> <li><a href="http://learnr.wordpress.com/2010/02/25/ggplot2-plotting-dates-hours-and-minutes/" rel="nofollow noreferrer">Started here</a> at learnr.wordpress, a popular R blog. It stated that I needed to get my data into POSIXct format, which I now think is false and wasted my time.</li> <li><a href="http://learnr.wordpress.com/2009/05/05/ggplot2-two-time-series-with-different-dates/" rel="nofollow noreferrer">Another learnr post</a> recreates a time series in ggplot2, but wasn't really applicable to my situation.</li> <li><a href="http://www.r-bloggers.com/plotting-time-series-data-using-ggplot2/" rel="nofollow noreferrer">r-bloggers has a post on this</a>, but it appears outdated. The simple <code>format=</code> option did not work for me.</li> <li><a href="https://stackoverflow.com/questions/6638696/breaks-for-scale-x-date-in-ggplot2-and-r">This SO question</a> is playing with breaks and labels. I tried treating my <code>Date</code> vector as continuous and don't think it worked so well. It looked like it was overlaying the same label text over and over so the letters looked kind of odd. The distribution is sort of correct but there are odd breaks. My attempt based on the accepted answer was like so (<a href="https://i.stack.imgur.com/ntDPD.png" rel="nofollow noreferrer">result here</a>).</li> </ul>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload