Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Add some class variables to the custom Analyzer class which can be easily set and unset on the fly. Then, in the tokenStream() function, use those variables to determine which filters to perform.</p> <pre><code>public class MyAnalyzer extends Analyzer { private Set customStopSet; public static final String[] STOP_WORDS = ...; private boolean doStemming = false; private boolean doStopping = false; public JavaSourceCodeAnalyzer(){ super(); customStopSet = StopFilter.makeStopSet(STOP_WORDS); } public void setDoStemming(boolean val){ this.doStemming = val; } public void setDoStopping(boolean val){ this.doStopping = val; } public TokenStream tokenStream(String fieldName, Reader reader) { // First, convert to lower case TokenStream out = new LowerCaseTokenizer(reader); if (this.doStopping){ out = new StopFilter(true, out, customStopSet); } if (this.doStemming){ out = new PorterStemFilter(out); } return out; } } </code></pre> <p>There is one gotcha: LowerCaseTokenizer takes as input the reader variable, and returns a TokenStream. This is fine for the following filters (StopFilter, PorterStemFilter), because they take TokenStreams as input and return them as output, and so we can chain them together nicely. However, this means you can't have a filter before the LowerCaseTokenizer that returns a TokenStream. In my case, I wanted to split camelCase words into parts, and this has to be done before converting to lower case. My solution was to perform the splitting manually in the custom Indexer class, so by the time MyAnalyzer sees the text, it has already been split. </p> <p>(I have also added a boolean flag to my customer Indexer class, so now both can work based solely on flags.)</p> <p>Is there a better answer?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload