Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You can't use LIKE if you care about performance.</p> <p>If you are trying to do approximate string matching (e.g. Test and est and best, etc.) and you don't want to use Sql full-text search take a look at <a href="http://qualityofdata.com/2011/09/21/283/" rel="nofollow">this article</a>.</p> <p>At least you can shortlist approximate matches then run your wildcard test on them.</p> <p>--EDIT 2--</p> <p>Your problem is interesting in the context of your limitation. Thinking about it again, I am pretty sure that using 3 gram would boost the performance (going back to my initial suggestion). </p> <p>Let's say if you setup your 3gram data, you'll be having the following tables:</p> <pre><code>Customer : 14M Customer3Grams : Maximum 700M //Considering the field is varchar(50) 3Grams : 78 Pattern : 1000 Pattern3Grams : 50K </code></pre> <p>To join pattern to customer then you need the following join:</p> <p>Pattern x Pattern3Grams x Customer3Grams x Customer</p> <p>With appropriate indexing (which is easy) each look-up can happen in O(LOG(50K)+LOG(700M)+LOG(14M)) which is equal to 47.6.</p> <p>Considering appropriate indexes are present the whole join can be calculated with less than 50,000 look-ups and of course the cost of scanning after look ups. I expect it to be very efficient (matter of seconds).</p> <p>The cost of creating 3grams for each new customer is also minimal because it would be maximum of 50x75 possible three grams that should be appended to the customer3Grams table.</p> <p>--EDIT--</p> <p>Depending to your data I can also suggest hash based clustering. I assume customer numbers are numbers with some character patterns in them (e.g. 123231ttt3x4). If this is the case you can create a simple hash function that calculates the result of bit-wise OR for every letter (not numbers) and add it as an indexed column to your table. You can filter on the result of the hash before applying LIKE.</p> <p>Depending to your data this may cluster your data effectively and improve your search by factor of the number of clusters (number of hash). You can test it by applying the hash and counting the number of distinct generated hash. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload