Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I wasn't able to get the FVH to handle phrase queries correctly, and wound up having to develop my own summarizer. The gist of my approach is discussed <a href="https://stackoverflow.com/questions/5838542/matching-token-sequences" title="Matching Token Sequences | StackOverflow">here</a>; what I wound up doing is creating an array of objects, one for each term that I pulled from the queries. Each object contains a word index and its position, and whether it was already used in some match. These instances are the <code>TermAtPosition</code> instances in the sample below. Then, given position span and an array of word identities (indexes) corresponding to a phrase query, I iterated through the array, looking to match all term indexes within the given span. If I found a match, I marked each matching term as being consumed, and added the matching span to a list of matches. I could then use these matches to score sentences. Here is the matching code:</p> <pre><code>protected void scorePassage(TermPositionVector v, String[] words, int span, float score, SentenceScore[] scores, Scorer scorer) { TermAtPosition[] order = getTermsInOrder(v, words); if (order.length &lt; words.length) return; int positions[] = new int[words.length]; List&lt;int[]&gt; matches = new ArrayList&lt;int[]&gt;(); for(int t=0; t&lt;order.length; t++) { TermAtPosition tap = order[t]; if (tap.consumed) continue; int p = 0; positions[p++] = tap.position; for(int u=0; u&lt;words.length; u++) { if (u == tap.termIndex) continue; int nextTermPos = spanContains(order, u, tap.position, span); if (nextTermPos == -1) break; positions[p++] = nextTermPos; } // got all terms if (p == words.length) matches.add(recordMatch(order, positions.clone())); } if (matches.size() &gt; 0) for (SentenceScore sentenceScore: scores) { for(int[] matchingPositions: matches) scorer.scorePassage(sentenceScore, matchingPositions, score); } } protected int spanContains(TermAtPosition[] order, int targetWord, int start, int span) { for (int i=0; i&lt;order.length; i++) { TermAtPosition tap = order[i]; if (tap.consumed || tap.position &lt;= start || (tap.position &gt; start + span)) continue; if (tap.termIndex == targetWord) return tap.position; } return -1; } </code></pre> <p>This approach seems to work, but it is greedy. Given a sequence "a a b c" it will it match the first a (leaving the second a alone), and then match b and c. I think a bit of recursion or integer programming could be applied to make it less greedy, but I couldn't be bothered, and wanted a faster rather than a more accurate algorithm anyway.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload