StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
17266183
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2013-06-23T23:16:27.463
FavoriteCount
0
LastActivityDate
2013-06-25T17:02:32.720
LastEditDate
2013-06-25T17:02:32.720
LastEditorUserId
2246674
OwnerUserId
2246674
ParentId
17265957
PostTypeId
2
Score
8
ViewCount
0
LastEditorDisplayName
text
Body
While the only way to objectively answer this question is with performance tests, I will argue for using a Hashtable Map. (Caching and memory access can be so full of surprises; I do not have the expertise to speculate on which one will be faster, and when. Also consider that localized performance differences may be marginalized by other code.) My first reason for "initially choosing" a hash is based off of the observation that there are 100M distinct keys but only 0.1M active records. This means that if using an array, index utilization will only be 0.1% - this is a very sparse array. If the data is stored as values in the array then it needs to be relatively small or the array size will balloon. If the data is not stored in the array (e.g. array is of pointers) then the argument for locality of data in the array is partially mitigated. Either way, the simple array approach requires lots of unused space. Since all the keys are already integers, the distribution (hash) function and can be efficiently implemented - there is no need to create a hash of a complex type/sequence so the "cost" of this function should approach zero. So, my simple proposed hash: <ul> <li>Use linear probing backed by contiguous memory. It is simple, has good locality (especially during the probe), and avoids needing to do any form of dynamic allocation.</li> <li>Pick a suitable initial bucket size; say, 2x (or 0.2M buckets, primed). Don't even give the hash a chance of resize. Note that this suggested bucket array size is only 0.2% the size of the simple array approach and could be reduced further as the size vs. collision rate can be tuned.</li> <li>Create a good distribution function for the hash. It can also exploit knowledge of the ID range.</li> </ul> While I've presented specialized hashtable rules "optimized" for the given case, I would start with a normal Map implementation (be it a hashtable or tree) and test it .. if a standard implementation works suitably well, why not use it? Now, test different candidates under expected and extreme loads - and pick the winner.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POif huge array is faster than hash-map for look-up?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USuser2246674
UserOwnerUserId
1. USuser2246674
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.