Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>First off, it's not about picking a prime number. For your example, if you know your data set will be in the range 1 to 10,000, picking 127 or 128 won't make a difference bc it's a poor design choice. </p> <p>Rather, it's better to pick a REALLY large prime like 3967 for your example so that each data will have its own unique key/value pair. You just want to also minimize collisions. Picking 127 or 128 for your example won't make a difference bc all 127/128 buckets will be uniformly filled (this is bad and will degrade the insertion and lookup run time O(1) to O(n)) as opposed to 3967 (which will preserve the O(1) run times)</p> <p>EDIT #4</p> <blockquote> <p>The design of the "hash function" is somewhat of a black art. It can be highly influenced by the data that's intended to be stored in the hashing-based data structure, so the discussion on a sensible hashing function can often stray into a discussion about specific inputs.</p> <p>As why primes are "preferred", one has to consider an "adversary" analysis, that is suppose I designed a general hashing-based data structure, how would it perform given the worst input from an adversary. Since performance is dictated by hashing collisions the question becomes what's the hash to use that minimizes collision in the worst condition. One such condition is when the input are always numbers divisible by some integer, say 4. If you use N = 128 then any number divisible by 4 mod 128 is still divisible by 4, which means only buckets 4, 8, 12, ... are always ever used, resulting in 25% utilization of the data structure. Primes effectively reduces the likelihood of such scenario occurring, with numbers > N.</p> </blockquote>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload