StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
5259857
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2011-03-10T12:54:19.667
FavoriteCount
0
LastActivityDate
2011-03-10T13:02:04.443
LastEditDate
2011-03-10T13:02:04.443
LastEditorUserId
635541
OwnerUserId
635541
ParentId
5258700
PostTypeId
2
Score
4
ViewCount
0
LastEditorDisplayName
text
Body
TSP isn't a great way to think about this problem. Let n be the length of the text and m be the length of the query; assume n > m. The naive solution <pre><code>best = infinity for i = 1 to n for j = i to n all_found = true for k = 1 to m found = false for l = i to j if text[l] == query[k] found = true all_found = all_found || found if all_found && j - i < best best = j - i best_i = i best_j = j </code></pre> is already polynomial-time at O(n3 m) for bounded-length words. Now let's optimize. First, hoist the inner loop via a hash set. <pre><code>best = infinity for i = 1 to n for j = i to n subtext_set = {} for l = i to j subtext_set = subtext_set union {text[l]} all_found = true for k = 1 to m all_found = all_found && query[k] in subtext_set if all_found && j - i < best best = j - i best_i = i best_j = j </code></pre> The running time is now O(n3), or O(n3 log n) if we use a binary tree instead. Observe now that it's wasteful to recompute <code>subtext_set</code> when the upper bound increases by one. <pre><code>best = infinity for i = 1 to n subtext_set = {} for j = i to n subtext_set = subtext_set union {text[l]} all_found = true for k = 1 to m all_found = all_found && query[k] in subtext_set if all_found && j - i < best best = j - i best_i = i best_j = j </code></pre> We're at O(n2 m). Now it seems wasteful to recheck the entire query when <code>subtext_set</code> is augmented by just one element: why don't we just check that one, and remember how many we have to go? <pre><code>query_set = {} for k = 1 to m query_set = query_set union {query[k]} best = infinity for i = 1 to n subtext_set = {} num_found = 0 for j = i to n if text[l] in query_set && text[l] not in subtext_set subtext_set = subtext_set union {text[l]} num_found += 1 if num_found == m && j - i < best best = j - i best_i = i best_j = j </code></pre> We're at O(n2). Getting to O(n) requires a couple of insights. First, let's look at how many query words each substring contains for the example <pre><code>text = Bar has a computer at home. Bar 1 2 3 4 5 6 7 query = Bar computer a # j 1 2 3 4 5 6 7 i +-------------- 1 | 1 1 2 3 3 3 3 2 | 0 0 1 2 2 2 3 3 | 0 0 1 2 2 2 3 4 | 0 0 0 1 1 1 2 5 | 0 0 0 0 0 0 1 6 | 0 0 0 0 0 0 1 7 | 0 0 0 0 0 0 1 </code></pre> This matrix has non-increasing columns and non-decreasing rows, and that's true in general. We want to traverse the underside of the entries with value m, because further in corresponds to a longer solution. The algorithm is the following. If the current i, j have all of the query words, then increase i; otherwise, increase j. With our current data structures, increasing j is fine but increasing i is not, because our data structures don't support deletion. Instead of a set, we need to keep a multi-set and decrement <code>num_found</code> when the last copy of a query word disappears. <pre><code>best = infinity count = hash table whose entries are zero by default for k = 1 to m count[query[k]] = -1 num_found = 0 i = 1 j = 0 while true if num_found == m if j - i < best best = j - i best_i = i best_j = j count[text[i]] -= 1 if count[text[i]] == -1 num_found -= 1 i += 1 else j += 1 if j > n break if count[text[j]] == -1 num_found += 1 count[text[j]] += 1 </code></pre> We've arrived at O(n). The last asymptotically relevant optimization is to reduce the extra space usage from O(n) to O(m) by storing counts only for elements in the query. I'll leave that one as an exercise. (Also, some more care must be taken to handle empty queries.)
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POAlgorithm for finding shortest sentence matching a pattern
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USuser635541
UserOwnerUserId
1. USuser635541
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POAlgorithm for finding shortest sentence matching a pattern
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.