Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to setup ElasticSearch to do SQL LIKE "%" for email addresses?
    primarykey
    data
    text
    <p>In SQL, I can search email addresses pretty well with SQL LIKE.</p> <p>With an email "stack@domain.com", searching "stack", "@domain.com", "domain.com", or "domain" would get me back the desired email address.</p> <p>How can I get the same result with ElasticSearch?</p> <p>I played with nGram, edgeNGram, uax_url_email, etc and the search results have been pretty bad. Please correct me if I'm wrong, it sounds like I have to do the following:</p> <ol> <li>for index_analyzer <ul> <li>use "keyword", "whitespace", or "uax_url_email" tokenizer so the email don't get tokenized <ul> <li>but wildcard queries don't seem to work (with tire at least)</li> </ul></li> <li>use "nGram" or "edgeNGram" for filter <ul> <li>I always get way too many unwanted results like getting "first@domain.com" when searching "first-second".</li> </ul></li> </ul></li> <li>for search_analyzer <ul> <li>don't do nGram</li> </ul></li> </ol> <p>One experiment code </p> <pre><code>tire.settings :number_of_shards =&gt; 1, :number_of_replicas =&gt; 1, :analysis =&gt; { :filter =&gt; { :db_ngram =&gt; { "type" =&gt; "nGram", "max_gram" =&gt; 255, "min_gram" =&gt; 3 } }, :analyzer =&gt; { :string_analyzer =&gt; { "tokenizer" =&gt; "standard", "filter" =&gt; ["standard", "lowercase", "asciifolding", "db_ngram"], "type" =&gt; "custom" }, :index_name_analyzer =&gt; { "tokenizer" =&gt; "standard", "filter" =&gt; ["standard", "lowercase", "asciifolding"], "type" =&gt; "custom" }, :search_name_analyzer =&gt; { "tokenizer" =&gt; "whitespace", "filter" =&gt; ["lowercase", "db_ngram"], "type" =&gt; "custom" }, :index_email_analyzer =&gt; { "tokenizer" =&gt; "whitespace", "filter" =&gt; ["lowercase"], "type" =&gt; "custom" } } } do mapping do indexes :id, :index =&gt; :not_analyzed indexes :name, :index_analyzer =&gt; 'index_name_analyzer', :search_analyzer =&gt; 'search_name_analyzer' indexes :email, :index_analyzer =&gt; 'index_email_analyzer', :search_analyzer =&gt; 'search_email_analyzer' end end </code></pre> <p>Specific cases that don't work well:</p> <ul> <li>emails with hyphen (eg. email-hyphen@domain.com)</li> <li>query string '@' at the beginning or end</li> <li>exact matches</li> <li>searching with wildcard like '<em>@</em>' gets very unexpected results. </li> </ul> <p>Suppose I have, "aaa@email.com", "aaa_0@email.com", and "aaa-0@email.com, searching "aaa" gives me "aaa@a.com" "aaa-0@email.com. Searching "aaa*" give me everything, but "aaa-*" gives me nothing. So, how should I do <strong>exact match wildcard</strong> queries? For these type of queries, I get pretty much the same results for different tokenizer/analyzer.</p> <p>I do these after each mapping change: Model.tire.index.delete Model.tire.create_elasticsearch_index Model.tire.index.import Model.all</p> <p>References:</p> <ul> <li><a href="https://stackoverflow.com/questions/14160295/configure-elasticsearch-to-use-ngram-by-default-sql-like-behavior">Configure ElasticSearch to use ngram by default. - SQL LIKE %% behavior</a></li> <li><a href="http://euphonious-intuition.com/2012/08/more-complicated-mapping-in-elasticsearch/" rel="nofollow noreferrer">http://euphonious-intuition.com/2012/08/more-complicated-mapping-in-elasticsearch/</a></li> </ul>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload