Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I'm actually surprised that you are not getting collisions before you normalize the range of hash values. It looks like you are using an unnormalized range of [0,2^32). Looking at the Birthday Problem chart <a href="https://en.wikipedia.org/wiki/Birthday_problem#Approximations" rel="nofollow">here</a> the probability of collision with 4*10^7 elements should be higher than 75%. In any case, normalizing the hash outputs to a range equal to the size of the set of elements is practically guaranteeing a non-trivial number of collisions. Unless you're willing to use a counter for your hash values I don't see how you'll be able to avoid that.</p> <p>EDIT: Saw your edit. Even with a range 100 times the number of elements (which is about 4*10*9) you are still likely to get a lot of collisions. As indicated above, the probability of one or more collisions is well over 75%.</p> <p>There are two things I would suggest:</p> <p><strong>Choose a Different Hash Function</strong></p> <p>As you noted, while your hash function is fast, it will not distribute values randomly in the range [0,2^32). There are several hash functions out there that are both fast and do a better job distributing hash values across the hash function range. One that I've used in the past is the <a href="https://en.wikipedia.org/wiki/MurmurHash" rel="nofollow">MurmurHash</a>.</p> <p><strong>Use a Larger Range</strong></p> <p>Using a larger range should reduce the risk of collisions. Looking at the chart <a href="https://en.wikipedia.org/wiki/Birthday_problem#Approximations" rel="nofollow">here</a> again it looks like 64 bits should be enough to reduce the risk of collision to less than 10^-6. The MurmurHash64A and MurmurHash64B variants will be useful in this case.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload