StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POHow to make a fast dictionary that contains another dictionary?
primarykey
Id
17770166
data
AcceptedAnswerId
17772760
AnswerCount
3
ClosedDate
CommentCount
9
CommunityOwnedDate
CreationDate
2013-07-21T07:44:14.413
FavoriteCount
0
LastActivityDate
2013-07-22T23:22:36.093
LastEditDate
2013-07-21T07:54:16.200
LastEditorUserId
541686
OwnerUserId
541686
ParentId
0
PostTypeId
1
Score
1
ViewCount
167
LastEditorDisplayName
text
Body
I have a <code>map<size_t, set<size_t>></code>, which, for better performance, I'm actually representing as a lexicographically-sorted <code>vector<pair<size_t, vector<size_t>>></code>. What I need is a <code>set<T></code> with fast insertion times (removal doesn't matter), where <code>T</code> is the data type above, so that I can check for duplicates (my program runs until there are no more unique <code>T</code>'s being generated.). So far, switching from <code>set</code> to <code>unordered_set</code> has turned out to be quite beneficial (it makes my program run > 25% faster), but even now, inserting <code>T</code> still seems to be one of the main bottlenecks. The maximum number of integers in a given <code>T</code> is around ~1000, and each integer is also <= ~1000, so the numbers are quite small (but there are thousands of these <code>T</code>'s being generated). What I have already tried: <ul> <li>Using <code>unsigned short</code>. It actually decreases performance slightly.</li> <li>Using Google's <a href="https://code.google.com/p/cpp-btree/source/browse/btree_map.h" rel="nofollow"><code>btree::btree_map</code></a>. It's actually much slower because I have to work around the iterator invalidation. (I have to copy the keys, and I think that's why it turned out slow. It was at least twice as slow.)</li> <li>Using a different hash function. I haven't found any measurable difference as long as I use something reasonable, so it seems like this can't be improved.</li> </ul> What I have not tried: <ul> <li>Storing "fingerprints"/hashes instead of the actual sets. This sounds like the perfect solution, except that the fingerprinting function needs to be fast, and I need to be extremely confident that collisions won't happen, or they'll screw up my program. (It's a deterministic program that needs exact results; collisions render it useless.)</li> <li>Storing the data in some other compact, CPU-friendly way. I'm not sure how beneficial this would be, because it might involve copying around data, and most of the performance I've gained so far is by (cleverly) avoiding copying data in many situations.</li> </ul> <h3>What else can I do to improve the speed, if anything?</h3>
Tags
<c++><performance><data-structures><set><hashtable>
Title
How to make a fast dictionary that contains another dictionary?
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USMehrdad
UserOwnerUserId
1. USMehrdad
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POHow to make a fast dictionary that contains another dictionary?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.