StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POMethods for doing heatmaps, level / contour plots, and hexagonal binning
primarykey
Id
7851602
data
AcceptedAnswerId
7852065
AnswerCount
4
ClosedDate
CommentCount
6
CommunityOwnedDate
CreationDate
2011-10-21T15:23:07.077
FavoriteCount
14
LastActivityDate
2012-01-12T22:07:16.037
LastEditDate
2012-01-12T22:07:16.037
LastEditorUserId
805808
OwnerUserId
805808
ParentId
0
PostTypeId
1
Score
10
ViewCount
4405
LastEditorDisplayName
text
Body
The options for 2D plots of (x,y,z) in R are a bit numerous. However, grappling with the options is a bit of a challenge, especially in the case that all three are continuous. To clarify the problem (and possibly assist in explaining why I might be getting tripped up with <code>contour</code> or <code>image</code>), here is a possible classification scheme: <ul> <li>Case 1: The value of z is not provided but is a conditional density based on values in (x,y). (Note: this is essentially relegating the calculation of z to a separate function - a density estimation. Something still has to use the output of that calculation, so allowing for arbitrary calculations would be nice.)</li> <li>Case 2: (x,y) pairs are unique and regularly spaced. This implies that only one value of z is provided per (x,y) value.</li> <li>Case 3: (x,y) pairs are unique, but are continuous. Coloring or shading is still determined by only 1 unique z value.</li> <li>Case 4: (x,y) pairs are not unique, but are regularly spaced. Coloring or shading is determined by an aggregation function on the z values.</li> <li>Case 5: (x,y) pairs are not unique, are continuous. Coloring / shading must be determined by an aggregation function on the z values.</li> </ul> If I am missing some cases, please let me know. The case that interests me is #5. Some notes on relationships: <ul> <li>Case #1 seems to be well supported already.</li> <li>Case #2 is easily supported by <code>heatmap</code>, <code>image</code>, and functions in <code>ggplot</code>.</li> <li>Case #3 is supported by base <code>plot</code>, though use of a color gradient is left to the user.</li> <li>Case #4 can become case #2 by use of a split & apply functionality. I have done that before. </li> <li>Case #5 can be converted to #4 (and then #2) by using <code>cut</code>, but this is inelegant and boxy. Hex binning may be better, though that does not seem to be easily conditioned on whether there is a steep gradient in the value of z. I'd settle for hex binning, but alternative aggregation functions are quite welcome, especially if they can utilize the z values.</li> </ul> How can I do #5? Here is code to produce a saddle, though the value of <code>spread</code> changes the spread of the z value, which should create differences in plotting gradients. <pre><code>N = 1000 spread = 0.6 # Vals: 0.6, 3.0 set.seed(0) rot = matrix(rnorm(4), ncol = 2) mat0 = matrix(rnorm(2 * N), ncol = 2) mat1 = mat0 %*% rot zMean = mat0[,2]^2 - mat0[,1]^2 z = rnorm(N, mean = zMean, sd = spread * median(abs(zMean))) </code></pre> I'd like to do something like <code>hexbin</code>, but I've banged on this with <code>ggplot</code> and haven't made much progress. If I can apply an arbitrary aggregation function to the z values in a region, that would be even better. (The form of such a function might be like <code>plot(mat1, colorGradient = f(z), aggregation = "bin", bins = 50)</code>.) How can I do this in ggplot or another package? I am happy to make this question a community wiki question (or other users can, by editing it enough times). If so, one answer per post, please, so that we can focus on, say, <code>ggplot</code>, <code>levelplot</code>, <code>lattice</code>, <code>contourplot</code> (or <code>image</code>), and other options, if they exist. <hr> Updates 1: The <a href="http://had.co.nz/ggplot2/stat_contour.html" rel="nofollow">volcano example</a> is a good example of case #3: the data is regularly spaced (it could be lat/long), with one z value per observation. A topographic map has (latitude, longitude, altitude), and thus one value per location. Suppose one is obtaining weather (e.g. rainfall, windspeed, sunlight) over many days for many randomly placed sensors: that is more akin to #5 than to #3 - we may have lat & long, but the z values can range quite a bit, even for the same or nearby (x,y) values. Update 2: The answers so far, by DWin, Kohske, and John Colby are all excellent. My actual data set is a small sample of a larger set, but at 200K points it produces interesting results. On the (x,y) plane, it is has very high density in some regions (thus, overplotting would occur in those areas) and much lower density or complete absence in other regions. With John's suggestion via <code>fields</code>, I needed to subsample the data for <code>Tps</code> to work out (I'll investigate if I can do it without subsampling), but the results are quite interesting. Trying <code>rms</code>/<code>Hmisc</code> (DWin's suggestion), the full 200K points seem to work out well. Kohske's suggestion is quite good, and, as the data is transformed into a grid before plotting, there's no issue with the number of input data points. It also gives me greater flexibility to determine how to aggregate the z values in the region. I am not yet sure if I will use mean, median, or some other aggregation. I also intend to try out Kohske's nice example of <code>mutate</code> + <code>ddply</code> with the other methods - it is a good example of how to get different statistics calculated over a given region. <hr> Update 3: The different methods are distinct and several are remarkable, though there isn't a clear winner. I selected John Colby's answer as the first. I think I will use that or DWin's method in further work.
Tags
<r><plot><ggplot2>
Title
Methods for doing heatmaps, level / contour plots, and hexagonal binning
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USIterator
UserOwnerUserId
1. USIterator
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
2. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POMethods for doing heatmaps, level / contour plots, and hexagonal binning
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POMethods for doing heatmaps, level / contour plots, and hexagonal binning
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POMethods for doing heatmaps, level / contour plots, and hexagonal binning
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.