Note that there are some explanatory texts on larger screens.

plurals
  1. POSecure way of exchanging email addresses (hashing) to allow matching for overlap on another list, but not reveal those for which there is no overlap?
    primarykey
    data
    text
    <p>I'm with an organization (Company A) that has a large email list. I'm sending a 10,000 email subset of this list to another organization (Company B) to test for overlap (discover which email addresses are on both lists). I want to send the list in a way that is easy for Company B to test for overlap, but difficult (ideally impossible) for Company B to "decode" the email addresses which are NOT already on their list. Secondarily, I want to ensure that if the list I send winds up in the wrong hands (some 3rd party), it would be difficult for anyone else to learn the actual email addresses on the list.</p> <p>My current solution is to simply pull the emails from our database as</p> <pre><code>SHA1(email + a_long_random_salt) </code></pre> <p>Using the same salt for each email address.</p> <p>To do the match, I send the list of hashes and the salt (securely, separately) to Company B, and they simply search their database using</p> <pre><code>SELECT email FROM members WHERE SHA1(email + the_salt) IN(hash1, hash2, hash3....) </code></pre> <p>(Or they pre-compute the SHA1 hash for each address and store it in the DB with the email address so the hashing doesn't need to happen as the query is run)</p> <p>A sufficiently long/random salt prevents against use of a precomputed rainbow table to crack the hashes. I assume it to be rather unlikely that anyone has a rainbow table of millions upon millions of plausible email addresses salted with whatever 100 character random string I use as my salt. As long as the salt is kept secret, no 3rd party is going to decode this list with a rainbow table or brute force. (Please, correct me if I'm somehow wrong here.)</p> <p>The issue that I'm struggling with is there are obviously easily-obtained lists of millions upon millions of email addresses harvested from the web. It would be pretty easy for Company B to obtain one of these lists, compute the hashes using the salt I've provided, and recover some significant portion of emails on the list I've sent (certainly not all, but a significant portion).</p> <p>Is there some strategy to accomplish this match that I'm failing to come up with? The only thing I can think of is to use a more complex hashing method (i.e. multiple iterations) to make it <strong>slower</strong> to match against a list of hundreds of millions of email addresses (that theoretical list scraped from the web). The key is that it would really only be <strong>slower</strong> -- not really even <strong>difficult</strong>. Also, I know that Company B's own email list is in the range of 1 million addresses, so I can't give them a hashing scheme that would take many seconds to compute for each address on that list of 1 million. Simply making it slower doesn't solve the issue -- I think I need a completely different approach.</p> <p>Honestly, this particular case this is more of an academic exercise for me than a real security concern. I trust Company B is not going to try to do this (we work together often), and even if they did it would be no huge loss. All they could possibly learn is email addresses of 10,000 people on our mailing list -- we're not talking about passwords, credit card numbers, etc. If we were dealing with passwords or credit card numbers, I wouldn't even be considering developing some scheme of my own. And, yes, of course I realize that SHA-256 or some other newer algorithm might be a bit preferable to SHA1, but only to some very limited extent. It's not a brute force crack of the hash that I'm worried about here.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload