Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I found a solution to this problem. I changed from using the <code>Highlighter</code> class to using the <code>FastVectorHighlighter</code>. It looks like I'll pick up some speed improvements too (at the expense of storage of term vector data). For the benefit of anyone coming across this question later, here's a unit test showing how this all works together:</p> <pre class="lang-java prettyprint-override"><code>package com.sample.index; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.en.EnglishAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.vectorhighlight.*; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.util.Version; import org.junit.Before; import org.junit.Test; import java.io.IOException; import java.util.ArrayList; import java.util.List; import static junit.framework.Assert.assertEquals; public class TestIndexStuff { public static final String FIELD_NORMAL = "normal"; public static final String[] PRE_TAGS = new String[]{"["}; public static final String[] POST_TAGS = new String[]{"]"}; private IndexSearcher searcher; private Analyzer analyzer = new EnglishAnalyzer(Version.LUCENE_35); @Before public void init() throws IOException { RAMDirectory idx = new RAMDirectory(); IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35, analyzer); IndexWriter writer = new IndexWriter(idx, config); addDocs(writer); writer.close(); searcher = new IndexSearcher(IndexReader.open(idx)); } private void addDocs(IndexWriter writer) throws IOException { for (String text : new String[] { "Pretty much everyone likes goats.", "I have a goat that eats everything.", "goats goats goats goats goats"}) { Document doc = new Document(); doc.add(new Field(FIELD_NORMAL, text, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); writer.addDocument(doc); } } private FastVectorHighlighter makeHighlighter() { FragListBuilder fragListBuilder = new SimpleFragListBuilder(200); FragmentsBuilder fragmentBuilder = new SimpleFragmentsBuilder(PRE_TAGS, POST_TAGS); return new FastVectorHighlighter(true, true, fragListBuilder, fragmentBuilder); } @Test public void highlight() throws ParseException, IOException { Query query = new QueryParser(Version.LUCENE_35, FIELD_NORMAL, analyzer) .parse("goat"); FastVectorHighlighter highlighter = makeHighlighter(); FieldQuery fieldQuery = highlighter.getFieldQuery(query); TopDocs topDocs = searcher.search(query, 10); List&lt;String&gt; fragments = new ArrayList&lt;String&gt;(); for (ScoreDoc scoreDoc : topDocs.scoreDocs) { fragments.add(highlighter.getBestFragment(fieldQuery, searcher.getIndexReader(), scoreDoc.doc, FIELD_NORMAL, 10000)); } assertEquals(3, fragments.size()); assertEquals("[goats] [goats] [goats] [goats] [goats]", fragments.get(0).trim()); assertEquals("Pretty much everyone likes [goats].", fragments.get(1).trim()); assertEquals("I have a [goat] that eats everything.", fragments.get(2).trim()); } } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload