StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
11012187
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
5
CommunityOwnedDate
CreationDate
2012-06-13T09:41:42.800
FavoriteCount
0
LastActivityDate
2012-06-21T02:54:02.560
LastEditDate
2012-06-21T02:54:02.560
LastEditorUserId
1357650
OwnerUserId
1357650
ParentId
10939773
PostTypeId
2
Score
2
ViewCount
0
LastEditorDisplayName
text
Body
Theoretical interview questions like this always deal in small numbers (like 10 words). However, the number means nothing; it's there to separate out those candidates who really can think around the problem in the general form from those who simply regurgitate fixed answers to fixed interview questions they find on the internet. The best software houses will only favour solutions that are scalable. Therefore, you will gain top marks in an interview if your answer is simple, but also scalable to any size of problem (or, in this case, document). Therefore, sorting, loops inside loops, O(n^2) complexity, forget them all. If you presented any solutions like these to a leading-edge software company at interview you would fail. Your particular question is checking to see if you know about Hash Tables. The most efficient solution to this problem can be written in pseudo-code as follows: <pre><code>1. Initialise a new hash table. For each word in the document... 2. Generate a hash key for the word. 3. Lookup the word in the hash table using the key. If it is found, 4. Increment the count for the word. Otherwise, 5. Store the new word in table and set its count to one. </code></pre> The most important benefit of the above solution is that only a single scan of the document is required. No reading words into memory and then processing (two scans), no loops in loops (many scans), no sorting (even more passes). After exactly one pass of the document, if you read out the keys in the hash table, the count of each word tells you exactly how many times each word appeared in the document. Any word with a count greater than one is a duplicate. The secret to this solution is its use of hash tables. Generation of the hash key (step 2), key lookup (step 3), and key storage (step 5) can be implemented as near constant-time operations. This means the time these steps take hardly changes as the size of the input set (i.e. number of words) grows. It means that whether it's the 10th word in a document, or the 10 millionth word, inserting that word into the hash table (or looking it up) will take roughly the same very small amount of time. In this case, we additionally keep a count of each word's frequency in step 5. Incrementing a value is known to be a very efficient fixed-time operation. Any solution to this problem must scan all words in the document at least once. As our solution processes each word exactly once, with all words taking approximately the same very small constant time to process, we say our solution performs optimally and scales linearly, yielding O(n) performance (put simply, processing 1,000,000 words will take around 1000 times longer than processing 1000 words). In all, a scalable and efficient solution to the problem
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POApproach to sample 'C' challenge
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USaps2012
UserOwnerUserId
1. USaps2012
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POApproach to sample 'C' challenge
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.