Note that there are some explanatory texts on larger screens.

plurals
  1. POSOLR generates phrase queries on punctuation
    primarykey
    data
    text
    <p>I have the following analyzer chain in my SOLR 3.5 instance (although we don't have luceneMatch version set up):</p> <pre><code>&lt;fieldtype name="text_pt" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false"&gt; &lt;analyzer type="index"&gt; &lt;tokenizer class="solr.StandardTokenizerFactory" /&gt; &lt;filter class="solr.ASCIIFoldingFilterFactory" protected="protwords.txt" /&gt; &lt;filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" preserveOriginal="1" catenateWords="1" catenateNumbers="1" catenateAll="0" /&gt; &lt;filter class="solr.LowerCaseFilterFactory" /&gt; &lt;filter class="solr.StopFilterFactory" ignoreCase="false" words="portugueseStopWords.txt" /&gt; &lt;filter class="solr.BrazilianStemFilterFactory" /&gt; &lt;filter class="solr.RemoveDuplicatesTokenFilterFactory" /&gt; &lt;/analyzer&gt; &lt;analyzer type="query"&gt; &lt;tokenizer class="solr.StandardTokenizerFactory" /&gt; &lt;filter class="solr.ASCIIFoldingFilterFactory" protected="protwords.txt" /&gt; &lt;filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="portugueseSynonyms.txt" expand="true" /&gt; &lt;filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" preserveOriginal="1" catenateNumbers="0" catenateAll="0" protected="protwords.txt" /&gt; &lt;filter class="solr.LowerCaseFilterFactory" /&gt; &lt;filter class="solr.StopFilterFactory" ignoreCase="false" words="portugueseStopWords.txt" /&gt; &lt;filter class="solr.BrazilianStemFilterFactory" /&gt; &lt;filter class="solr.RemoveDuplicatesTokenFilterFactory" /&gt; &lt;/analyzer&gt; &lt;/fieldtype&gt; </code></pre> <p>Notice how we have the <strong>autoGeneratePhraseQueries="false"</strong>. <br> Apparently this should prevent the query parser from generating a phrase query when finding a punctuation symbol. However this is not happening.</p> <p>Look at the output of the &amp;debugQuery:</p> <pre><code>&lt;str name="querystring"&gt;title_search_pt:(looking,for peugeot)&lt;/str&gt; &lt;str name="parsedquery"&gt;+PhraseQuery(title_search_pt:"looking for") +title_search_pt:peugeot&lt;/str&gt; </code></pre> <p>As you can see that comma, although being tokenized correctly, now ends up as a phrase query. Lots of people copy paste text for searching (with punctuation), and under my logic punctuation should not mean "do a phrase query with that". <br></p> <p><strong>How can I disable/prevent this from happening?</strong></p> <p>The StandardTokenizer understands dots and commas perfectly and splits only when necessary (keeping hostnames, numbers, etc), so replacing all the punctuation on the client before querying SOLR is not an option, as I'd be loosing some important information.</p> <p>Thanks</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload