StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POSecure way of exchanging email addresses (hashing) to allow matching for overlap on another list, but not reveal those for which there is no overlap?
primarykey
Id
6155851
data
AcceptedAnswerId
0
AnswerCount
3
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2011-05-27T17:41:50.167
FavoriteCount
1
LastActivityDate
2011-05-30T07:14:14.360
LastEditDate
LastEditorUserId
0
OwnerUserId
408945
ParentId
0
PostTypeId
1
Score
1
ViewCount
2041
LastEditorDisplayName
text
Body
I'm with an organization (Company A) that has a large email list. I'm sending a 10,000 email subset of this list to another organization (Company B) to test for overlap (discover which email addresses are on both lists). I want to send the list in a way that is easy for Company B to test for overlap, but difficult (ideally impossible) for Company B to "decode" the email addresses which are NOT already on their list. Secondarily, I want to ensure that if the list I send winds up in the wrong hands (some 3rd party), it would be difficult for anyone else to learn the actual email addresses on the list. My current solution is to simply pull the emails from our database as <pre><code>SHA1(email + a_long_random_salt) </code></pre> Using the same salt for each email address. To do the match, I send the list of hashes and the salt (securely, separately) to Company B, and they simply search their database using <pre><code>SELECT email FROM members WHERE SHA1(email + the_salt) IN(hash1, hash2, hash3....) </code></pre> (Or they pre-compute the SHA1 hash for each address and store it in the DB with the email address so the hashing doesn't need to happen as the query is run) A sufficiently long/random salt prevents against use of a precomputed rainbow table to crack the hashes. I assume it to be rather unlikely that anyone has a rainbow table of millions upon millions of plausible email addresses salted with whatever 100 character random string I use as my salt. As long as the salt is kept secret, no 3rd party is going to decode this list with a rainbow table or brute force. (Please, correct me if I'm somehow wrong here.) The issue that I'm struggling with is there are obviously easily-obtained lists of millions upon millions of email addresses harvested from the web. It would be pretty easy for Company B to obtain one of these lists, compute the hashes using the salt I've provided, and recover some significant portion of emails on the list I've sent (certainly not all, but a significant portion). Is there some strategy to accomplish this match that I'm failing to come up with? The only thing I can think of is to use a more complex hashing method (i.e. multiple iterations) to make it slower to match against a list of hundreds of millions of email addresses (that theoretical list scraped from the web). The key is that it would really only be slower -- not really even difficult. Also, I know that Company B's own email list is in the range of 1 million addresses, so I can't give them a hashing scheme that would take many seconds to compute for each address on that list of 1 million. Simply making it slower doesn't solve the issue -- I think I need a completely different approach. Honestly, this particular case this is more of an academic exercise for me than a real security concern. I trust Company B is not going to try to do this (we work together often), and even if they did it would be no huge loss. All they could possibly learn is email addresses of 10,000 people on our mailing list -- we're not talking about passwords, credit card numbers, etc. If we were dealing with passwords or credit card numbers, I wouldn't even be considering developing some scheme of my own. And, yes, of course I realize that SHA-256 or some other newer algorithm might be a bit preferable to SHA1, but only to some very limited extent. It's not a brute force crack of the hash that I'm worried about here.
Tags
<security><encryption><sha1><salt>
Title
Secure way of exchanging email addresses (hashing) to allow matching for overlap on another list, but not reveal those for which there is no overlap?
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USJase
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POSecure way of exchanging email addresses (hashing) to allow matching for overlap on another list, but not reveal those for which there is no overlap?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COI should add that any solution needs to be relatively straightforward to implement. It can't be something that takes more than an hour or two to set up. And learning that there simply is no practical way of doing what I'm asking would be equally useful if that's the right answer.
 singulars
 PostPostId
 POSecure way of exchanging email addresses (hashing) to allow matching for overlap on another list, but not reveal those for which there is no overlap?
 UserUserId
 USJase

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.