Note that there are some explanatory texts on larger screens.

plurals
  1. POSolr(Lucene) is indexing only the first document after adding a custom TokenFilter
    primarykey
    data
    text
    <p>I created a custom token filter which concatenates all the tokens in the stream. This is my <code>incrementToken()</code> function </p> <pre><code>public boolean incrementToken() throws IOException { if (finished) { logger.debug("Finished"); return false; } logger.debug("Starting"); StringBuilder buffer = new StringBuilder(); int length = 0; while (input.incrementToken()) { if (0 == length) { buffer.append(termAtt); length += termAtt.length(); } else { buffer.append(" ").append(termAtt); length += termAtt.length() + 1; } } termAtt.setEmpty().append(buffer); //offsetAtt.setOffset(0, length); finished = true; return true; } </code></pre> <p>I added the new Filter to the end of index and query analysis chain for a field and testing the filter from <a href="http://localhost:8983/solr/admin/analysis.jsp" rel="nofollow">http://localhost:8983/solr/admin/analysis.jsp</a> seems to be working. The filter is concatenating the tokens in the stream. But on re-indexing the documents only my first document is getting indexed.</p> <p>This is how my filter chain looks like.</p> <pre><code> &lt;analyzer type="index"&gt; &lt;charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[-_]" replacement=" " /&gt; &lt;charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" /&gt; &lt;tokenizer class="solr.WhitespaceTokenizerFactory" /&gt; &lt;filter class="solr.LowerCaseFilterFactory" /&gt; &lt;filter class="solr.StopWordFilterFactory" ignoreCase="true" words="words.txt" /&gt; &lt;filter class="org.custom.solr.analysis.ConcatFilterFactory" /&gt; &lt;/analyzer&gt; &lt;analyzer type="query"&gt; &lt;charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[-_]" replacement=" " /&gt; &lt;charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" /&gt; &lt;tokenizer class="solr.WhitespaceTokenizerFactory" /&gt; &lt;filter class="solr.LowerCaseFilterFactory" /&gt; &lt;filter class="solr.StopWordFilterFactory" ignoreCase="true" words="words.txt" /&gt; &lt;filter class="org.custom.solr.analysis.ConcatFilterFactory" /&gt; &lt;/analyzer&gt; </code></pre> <p>Without the <code>ConcatFilterFactory</code> all words are getting indexed properly but with <code>ConcatFilterFactory</code> only the first document is getting indexed. What am I doing wrong? Kindly help me in understanding the problem.</p> <p><strong>UPDATE :</strong></p> <p>Finally figured out the issue.</p> <pre><code>if (finished) { logger.debug("Finished"); finished = false; return false; } </code></pre> <p>Looks like the same class is being reused. Makes sense.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload