StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
12225669
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
4
CommunityOwnedDate
CreationDate
2012-09-01T07:02:18.373
FavoriteCount
0
LastActivityDate
2015-09-07T00:27:34.653
LastEditDate
2015-09-07T00:27:34.653
LastEditorUserId
845048
OwnerUserId
845048
ParentId
12222469
PostTypeId
2
Score
52
ViewCount
0
LastEditorDisplayName
text
Body
I chose HBase because it scales. Whisper is much like RRD, it's a fixed-size database, it must destroy data in order to work within its space constraints. HBase offers the following properties that make it very well suited for large scale time series databases: <ol> <li>Linear scaling. Want to store data? Add more nodes. At StumbleUpon, where I wrote OpenTSDB, our time series data was co-located on a 20-node cluster that was primarily used for analytics and batch processing. The cluster grew to 120 nodes fairly quickly, and meanwhile OpenTSDB, which makes up only a tiny fraction of the cluster's workload, grew to half a trillion data points.</li> <li>Automatic replication. Your data is stored in HDFS, which by default means 3 replicas on 3 different machines. If a machine or a drives dies, no big deal. Drives and machines die all the time when you build commodity servers. But the thing is: you don't really care.</li> <li>Efficient scans. Most time series data is used to answer questions that are like "what are the data points between time X and Y". If you structure your keys properly, you can implement this very efficiently with HBase with a simple scan operation.</li> <li>High write throughput. The <a href="http://research.google.com/archive/bigtable.html">Bigtable design</a>, which HBase follows, uses <a href="https://en.wikipedia.org/wiki/Log-structured_merge-tree">LSM trees</a> instead of, say, B-trees, to make writes cheaper (at the expense of potentially more expensive reads).</li> </ol> The fact that HBase is column oriented wasn't nearly as important a consideration as the fact that it's a big sorted key-value system that really scales. All RRD-based and RRD-derived tools couldn't satisfy the scale requirements of being able to accurately store billions and billions of data points forever for very cheap (just a few bytes of actual disk space per data point).
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POWhy OpenTSDB chose HBase for Time Series data storage?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. UStsuna
UserOwnerUserId
1. UStsuna
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.