Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><strong>You could start by applying some prioritization to which emails to compare to one another.</strong></p> <p>A key reason for the performance limitations is the O(n<sup>2</sup>) performance of comparing each address to every other email address. <strong>Prioritization is the key to improving performance of this kind of search algorithm.</strong></p> <p>For instance, you could bucket all emails that have a similar length (+/- some amount) and compare that subset first. You could also strip all special charaters (numbers, symbols) from emails and find those that are identical after that reduction.</p> <p>You may also want to create a trie from the data rather than processing it line by line, and use that to find all emails that share a common set of suffixes/prefixes and drive your comparison logic from that reduction. From the examples you provided, it looks like you are looking for addresses where a part of one address could appear as a substring within another. <a href="http://en.wikipedia.org/wiki/Trie" rel="nofollow noreferrer">Tries</a> (and <a href="http://en.wikipedia.org/wiki/Suffix_tree" rel="nofollow noreferrer">suffix trees</a>) are an efficient data structure for performing these types of searches.</p> <p>Another possible way to optimize this algorithm would be to use the date when the email account is created (assuming you know it). If duplicate emails are created they would likely be created within a short period of time of one another - this may help you reduce the number of comparisons to perform when looking for duplicates.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload