Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to add options for Analyze in Apache Lucene?
    primarykey
    data
    text
    <p>Lucene has Analyzers that basically tokenize and filter the corpus when indexing. Operations include converting tokens to lowercase, stemming, removing stopwords, etc. </p> <p>I'm running an experiment where I want to try all possible combinations of analysis operations: stemming only, stopping only, stemming and stopping, ...</p> <p>In total, there 36 combinations that I want to try.</p> <p>How can I do easily and gracefully do this?</p> <p>I know that I can extend the Analyzer class and implement the tokenStream() function to create my own Analyzer:</p> <pre><code>public class MyAnalyzer extends Analyzer { public TokenStream tokenStream(String field, final Reader reader){ return new NameFilter( CaseNumberFilter( new StopFilter( new LowerCaseFilter( new StandardFilter( new StandardTokenizer(reader) ) ), StopAnalyzer.ENGLISH_STOP_WORDS) ) ); } </code></pre> <p>What I'd like to do is write one such class, which can somehow take boolean values for each of the possible operations (doStopping, doStemming, etc.). I don't want to have to write 36 different Analyzer classes that each perform one of the 36 combinations. What makes it difficult is the way the filters are all combined together in their constructors.</p> <p>Any ideas on how to do this gracefully?</p> <p><strong>EDIT</strong>: By "gracefully", I mean that I can easily create a new Analyzer in some sort of loop:</p> <pre><code>analyzer = new MyAnalyzer(doStemming, doStopping, ...) </code></pre> <p>where doStemming and doStopping change with each loop iteration. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload