StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
3999216
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2010-10-22T17:09:20.700
FavoriteCount
0
LastActivityDate
2010-10-22T20:09:39.627
LastEditDate
2010-10-22T20:09:39.627
LastEditorUserId
345345
OwnerUserId
345345
ParentId
3901266
PostTypeId
2
Score
6
ViewCount
0
LastEditorDisplayName
text
Body
<p>Try a stochastic regular grammar (equivalent to hidden markov models) with the following rules:</p> <pre><code>for every word in a dictionary: stream -> word_i stream with probability p_w word_i -> letter_i1 ...letter_in` with probability q_w (this is the spelling of word_i) stream -> letter stream with prob p (for any letter) stream -> epsilon with prob 1 </code></pre> <p>The probabilities could be derived from a training set, but see the following discussion. The most likely parse is computed using the Viterbi algorithm, which has quadratic time complexity in the number of hidden states, in this case your vocabulary, so you could run into speed issues with large vocabularies. But what if you set all the p_w = 1, q_w = 1 p = .5 Which means, these are probabilities in an artificial language model where all words are equally likely and all non-words are equally unlikely. Of course you could segment better if you didn't use this simplification, but the algorithm complexity goes down by quite a bit. If you look at the recurrence relation in the <a href="http://en.wikipedia.org/wiki/Viterbi_algorithm" rel="nofollow">wikipedia entry</a> you can try and simplify it for this special case. The viterbi parse probability up to position k can be simplified to <code>VP(k) = max_l(VP(k-l) * (1 if text[k-l:k] is a word else .5^l)</code> You can bound l with the maximim length of a word and find if a l letters form a word with a hash search. The complexity of this is independent of the vocabulary size and is <code>O(<text length> <max l>)</code>. Sorry this is not a proof, just a sketch but should get you going. Another potential optimization, if you create a trie of the dictionary, you can check if a substring is a prefix of any correct word. So when you query text[k-l:k] and get a negative answer, you already know that the same is true for text[k-l:k+d] for any d. To take advantage of this you would have to rearrange the recursion significantly, so I am not sure this can be fully exploited (it can see comment).</p>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POFind the words in a long stream of characters. Auto-tokenize
  singulars
  PostTypePostTypeId
  PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USpiccolbo
UserOwnerUserId
1. USpiccolbo
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTUpMod
2. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTUpMod
3. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.