StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
6715489
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
4
CommunityOwnedDate
CreationDate
2011-07-16T05:13:22.923
FavoriteCount
0
LastActivityDate
2011-07-16T06:55:06.580
LastEditDate
2011-07-16T06:55:06.580
LastEditorUserId
13663
OwnerUserId
13663
ParentId
6714715
PostTypeId
2
Score
0
ViewCount
0
LastEditorDisplayName
text
Body
Is a binary search of the table really so awful? I would take the list of potential strings and "minimize" them, the sort them, and finally do a binary search upon the block of them. By minimize I mean reducing them to the minimum they need to be, kind of a custom stemming. For example if you had the strings: "alfred", "bob", "bill", "joe", I'd knock them down to "a", "bi", "bo", "j". Then put those in to a contiguous block of memory, for example: <pre><code>char *table = "a\0bi\0bo\0j\0"; // last 0 is really redundant..but char *keys[4]; keys[0] = table; keys[1] = table + 2; keys[2] = table + 5; keys[3] = table + 8; </code></pre> Ideally the compiler would do all this for you if you simply go: <pre><code>keys[0] = "a"; keys[1] = "bi"; keys[2] = "bo"; keys[3] = "j"; </code></pre> But I can't say if that's true or not. Now you can bsearch that table, and the keys are as short as possible. If you hit the end of the key, you match. If not, then follow the standard bsearch algorithm. The goal is to get all of the data close together and keep the code itty bitty so that it all fits in to the CPU cache. You can process the key from the program directly, no pre-processing or adding anything up. For a reasonably large number of keys that are reasonably distributed, I think this would be quite fast. It really depends on the number of strings involved. For smaller numbers, the overhead of computing hash values etc is more than search something like this. For larger values, it's worth it. Just what those number are all depends on the algorithms etc. This, however, is likely the smallest solution in terms of memory, if that's important. This also has the benefit of simplicity. Addenda: You don't have any specifications on the inputs beyond 'strings'. There's also no discussion about how many strings you expect to use, their length, their commonality or their frequency of use. These can perhaps all be derived from the "source", but not planned upon by the algorithm designer. You're asking for an algorithm that creates something like this: <pre><code>inline int GetValue(char *key) { return 1234; } </code></pre> For a small program that happens to use only one key all the time, all the way up to something that creates a perfect hash algorithm for millions of strings. That's a pretty tall order. Any design going after "squeezing every single bit of performance possible" needs to know more about the inputs than "any and all strings". That problem space is simply too large if you want it the fastest possible for any condition. An algorithm that handles strings with extremely long identical prefixes might be quite different than one that works on completely random strings. The algorithm could say "if the key starts with "a", skip the next 100 chars, since they're all a's". But if these strings are sourced by human beings, and they're using long strings of the same letters, and not going insane trying to maintain that data, then when they complain that the algorithm is performing badly, you reply that "you're doing silly things, don't do that". But we don't know the source of these strings either. So, you need to pick a problem space to target the algorithm. We have all sorts of algorithms that ostensibly do the same thing because they address different constraints and work better in different situations. Hashing is expensive, laying out hashmaps is expensive. If there's not enough data involved, there are better techniques than hashing. If you have large memory budget, you could make an enormous state machine, based upon N states per node (N being your character set size -- which you don't specify -- BAUDOT? 7-bit ASCII? UTF-32?). That will run very quickly, unless the amount of memory consumed by the states smashes the CPU cache or squeezes out other things. You could possibly generate code for all of this, but you may run in to code size limits (you don't say what language either -- Java has a 64K method byte code limit for example). But you don't specify any of these constraints. So, it's kind of hard to get the most performant solution for your needs.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POFastest possible string key lookup for known set of keys
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USWill Hartung
UserOwnerUserId
1. USWill Hartung
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.