Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This is kind of a strange question. Let's start with the obvious errors in the code:</p> <pre><code>// Use large prime multiples to create a unique hash key // Create the hash offsets using a "even powers of 2 minus 1" method, which gives // primes most of the time. </code></pre> <p>First off, those are all odd powers of two minus one; none of them are even powers of two minus one. </p> <p>Second, of the four multipliers you've chosen as "large prime multiples", half of them are not prime. 2047 and 32767 are composite. </p> <p>Third, if we "correct" -- and I use the word advisedly -- the statement to be "odd powers of 2 minus one which gives primes most of the time" that statement is <em>absurdly</em> wrong. A prime of that form is known as a Mersenne prime, and <strong>there are only 47 known Mersenne primes</strong>. I assure you that the density of Mersenne primes is considerably lower than one half. Put it this way: of the odd-power Mersenne numbers between 2^1-1 and 2^43112609−1, 46 of them are known to be prime numbers, which is about one in a half a million, not half.</p> <p>Fourth, what do you imagine prime numbers have to do with anything? What mythological power do prime numbers have? Surely what matters is that the distribution of hash codes tends to not produce multiples of the hash table size. Since the hash table size is <em>chosen to be a prime number</em>, it seems like this is potentially <em>exacerbating</em> the problem.</p> <p>Fifth, hash keys are not unique; your question is about when they collide, so clearly they cannot be unique.</p> <p>Sixth, suppose your hash function had a <em>perfectly random</em> distribution across the space of 32 bit integers. By the birthday "paradox" you'd expect there to be a far greater than 99% chance of at least one collision when drawing ten million numbers at random from a 32 bit space. In fact, the expected number of collisions would be on the order of ten or twenty thousand. (We could work out the exact number of expected collisions, but who cares what it is exactly; it is in that order of magnitude.) </p> <p>Is that too many collisions? It is going to be very difficult to do better than a random distribution. If you require fewer collisions than that, then you shouldn't be using a 32 bit hashing algorithm in the first place.</p> <p>Seventh, who cares how many collisions a hash function has across its full range? Surely the practical question ought to really be "how does this hash perform with realistic data in a large table?" You, unlike us, can answer that question by <em>trying it</em>. If it meets your performance budget, great, worry about something else. If it doesn't, figure out why not before you start blaming the hash function.</p> <p>I am very confused by this question and what you hope to gain from its answer. Can you explain?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload