Note that there are some explanatory texts on larger screens.

plurals
  1. POWhat are the best practices to create a solr based de-duplication system?
    text
    copied!<p>I am setting up a solr search based de-duplication system that would return search results matching the search criteria. I have used dataimport handler to pull data from database and create indexed documents on the Solr server.</p> <p>My solr schema is as below:</p> <pre><code>&lt;field name="customer_id" type="int" indexed="true" stored="true" required="true" /&gt; &lt;field name="fname" type="phonetic" indexed="true" stored="true" /&gt; &lt;field name="lname" type="phonetic" indexed="true" stored="true"/&gt; &lt;field name="address" type="text_en" indexed="true" stored="true" /&gt; &lt;field name="city" type="string" indexed="true" stored="true" /&gt; &lt;field name="state" type="string" indexed="true" stored="true" /&gt; &lt;field name="zipcode" type="string" indexed="true" stored="true" /&gt; &lt;field name="telephone" type="string" indexed="true" stored="true" /&gt; </code></pre> <p>As seen above, I have specified the type of first name (fname) and last name (lname) fields as phonetic for phonetic search using DoubleMetaphoneFilterFactory. The description of phonetic field type is as below:</p> <pre><code>&lt;fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" &gt; &lt;analyzer&gt; &lt;tokenizer class="solr.StandardTokenizerFactory"/&gt; &lt;filter class="solr.LowerCaseFilterFactory" /&gt; &lt;filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/&gt; &lt;filter class="solr.DoubleMetaphoneFilterFactory" inject="true"/&gt; &lt;/analyzer&gt; &lt;/fieldtype&gt; </code></pre> <p>I want my searches to return the documents that match all the specified query fields and not just either of the search fields.</p> <p>My problem is that if I search for either fname, lname or address alone then the results are quite relevant but when I use filter query along with primary search query then the results contain union of results from both the search criteria.</p> <p>Please can somebody point out what I am doing wrong. <strong>Also, are there any best practices to keep in mind to design a solr schema for such a de-duplication system for a bank that could identify duplicate customer record(s).</strong></p> <p>Thanks in advance!!</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload