Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Perhaps one way to achieve what you're asking is to index each class of annotation at the same position (i.e., Word, POS, Chunk, NER) and prefix each of the annotations with a unique string. Don't bother with prefixes for words. You will need a custom Analyzer to preserve the prefixes, but then you should be able to use the syntax you want for queries. </p> <p>To be specific, what I am proposing is that you index the following tokens at the specified positions:</p> <pre><code>Position Word POS Chunk NER ======== ==== === ===== ======== 1 The POS=DT CHUNK=NP NER=Person 2 man POS=NN CHUNK=NP NER=Person 3 went POS=VBD CHUNK=VP - 4 to POS=TO CHUNK=PP - 5 the POS=DT CHUNK=NP NER=Location 6 store POS=NN CHUNK=NP NER=Location </code></pre> <p>To get the semantics, use <a href="http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/spans/SpanQuery.html" rel="nofollow">SpanQuery</a> or <a href="http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/spans/SpanTermQuery.html" rel="nofollow">SpanTermQuery</a> to preserve token sequence.</p> <p>I haven't tried this but indexing the different classes of terms at the same position should allow position-sensitive queries to do the right thing to evaluate expressions such as </p> <blockquote> <p>NER=Person arrived at NER=Location</p> </blockquote> <p>Note the difference from your example: I deleted the Word= prefix to treat that as the default. Also, your choice of prefix syntax (e.g., "class=") may constrain the contents of the document you are indexing. Make sure that the documents either don't contain the phrases, or that you escape them in some way in pre-processing. This is, of course, related to the analyzer you'll need to use.</p> <p><strong>Update:</strong> I used this technique for indexing sentence and paragraph boundaries in text (using <code>break=sen</code> and <code>break=para</code> tokens) so that I could decide where to break phrase query matches. Seems to work just fine.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload