StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
5748506
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2011-04-21T18:54:37.567
FavoriteCount
0
LastActivityDate
2011-05-08T04:49:17.023
LastEditDate
2011-05-08T04:49:17.023
LastEditorUserId
389051
OwnerUserId
389051
ParentId
2883012
PostTypeId
2
Score
1
ViewCount
0
LastEditorDisplayName
text
Body
Perhaps one way to achieve what you're asking is to index each class of annotation at the same position (i.e., Word, POS, Chunk, NER) and prefix each of the annotations with a unique string. Don't bother with prefixes for words. You will need a custom Analyzer to preserve the prefixes, but then you should be able to use the syntax you want for queries. To be specific, what I am proposing is that you index the following tokens at the specified positions: <pre><code>Position Word POS Chunk NER ======== ==== === ===== ======== 1 The POS=DT CHUNK=NP NER=Person 2 man POS=NN CHUNK=NP NER=Person 3 went POS=VBD CHUNK=VP - 4 to POS=TO CHUNK=PP - 5 the POS=DT CHUNK=NP NER=Location 6 store POS=NN CHUNK=NP NER=Location </code></pre> To get the semantics, use <a href="http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/spans/SpanQuery.html" rel="nofollow">SpanQuery</a> or <a href="http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/spans/SpanTermQuery.html" rel="nofollow">SpanTermQuery</a> to preserve token sequence. I haven't tried this but indexing the different classes of terms at the same position should allow position-sensitive queries to do the right thing to evaluate expressions such as <blockquote> NER=Person arrived at NER=Location </blockquote> Note the difference from your example: I deleted the Word= prefix to treat that as the default. Also, your choice of prefix syntax (e.g., "class=") may constrain the contents of the document you are indexing. Make sure that the documents either don't contain the phrases, or that you escape them in some way in pre-processing. This is, of course, related to the analyzer you'll need to use. Update: I used this technique for indexing sentence and paragraph boundaries in text (using <code>break=sen</code> and <code>break=para</code> tokens) so that I could decide where to break phrase query matches. Seems to work just fine.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POIndexing and Searching Over Word Level Annotation Layers in Lucene
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USGene Golovchinsky
UserOwnerUserId
1. USGene Golovchinsky
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.