StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
8716778
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2012-01-03T18:13:57.863
FavoriteCount
0
LastActivityDate
2012-01-03T18:13:57.863
LastEditDate
LastEditorUserId
0
OwnerUserId
1086020
ParentId
8716227
PostTypeId
2
Score
5
ViewCount
0
LastEditorDisplayName
text
Body
I'll cover the basics of textual document matching... Most document similarity measures work on a word basis, rather than sentence structure. The first step is usually <a href="http://en.wikipedia.org/wiki/Stemming" rel="noreferrer">stemming</a>. Words are reduced to their root form, so that different forms of similar words, e.g. "swimming" and "swims" match. Additionally, you may wish to filter the words you match to avoid noise. In particular, you may wish to ignore occurances of "the" and "a". In fact, there's a lot of conjunctions and pronouns that you may wish to omit, so usually you will have a long list of such words - this is called "<a href="http://en.wikipedia.org/wiki/Stop_words" rel="noreferrer">stop list</a>". Furthermore, there may be bad words you wish to avoid matching, such as swear words or racial slur words. So you may have another exclusion list with such words in it, a "bad list". So now you can count similar words in documents. The question becomes how to measure total document similarity. You need to create a score function that takes as input the similar words and gives a value of "similarity". Such a function should give a high value if the same word appears multiple times in both documents. Additionally, such matches are weighted by the total word frequency so that when uncommon words match, they are given more statistical weight. <a href="http://lucene.apache.org/java/docs/index.html" rel="noreferrer">Apache Lucene</a> is an open-source search engine written in Java that provides practical detail about these steps. For example, here is the information about how they weight query similarity: <a href="http://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/search/Similarity.html" rel="noreferrer">http://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/search/Similarity.html</a> <blockquote> Lucene combines Boolean model (BM) of Information Retrieval with Vector Space Model (VSM) of Information Retrieval - documents "approved" by BM are scored by VSM. </blockquote> All of this is really just about matching words in documents. You did specify matching sentences. For most people's purposes, matching words is more useful as you can have a huge variety of sentence structures that really mean the same thing. The most useful information of similarity is just in the words. I've talked about document matching, but for your purposes, a sentence is just a very small document. Now, as an aside, if you don't care about the actual nouns and verbs in the sentence and only care about grammar composition, you need a different approach... First you need a <a href="http://en.wikipedia.org/wiki/Link_grammar" rel="noreferrer">link grammar parser</a> to interpret the language and build a data structure (usually a tree) that represents the sentence. Then you have to perform inexact graph matching. This is a hard problem, but there are algorithms to do this on trees in polynomial time. 
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POBest way to rank sentences based on similarity from a set of Documents
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USTim Gee
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POBest way to rank sentences based on similarity from a set of Documents
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COThank you very much for your answer. Now I got a clear idea about how to proceed.
 singulars
 PostPostId
 PO
 UserUserId
 USAnantha Krishnan
2. COvery interesting post
 singulars
 PostPostId
 PO
 UserUserId
 USAlex
3. COGreat answer Tim. +1 of course.
 singulars
 PostPostId
 PO
 UserUserId
 USMichał Šrajer

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.