StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>If <code>(x1, y1)</code> and <code>(x2, y2)</code> are two random pairs of inputs, then let <code>f1 = f(x1,y1)</code> and <code>f2 = f(x2,y2)</code>.</p> <p>What you want to do is minimise</p> <pre><code>P( f(x1,y1) = f(x2,y2) ) = P(f1 = f2) = sum for i in [LONG_MIN ... LONG_MAX] of P(f1 = i) * P(f2 = i) = sum for i in [LONG_MIN ... LONG_MAX] of P(f1 = i)^2 </code></pre> <p>So you want to minimise the sum of the squares of the probabilities of each of your function's outputs. Since the <em>sum</em> of the probabilities must be 1, we know:</p> <pre><code>sum for i in [LONG_MIN ... LONG_MAX] of P(f1 = i) = 1 </code></pre> <p>And we also know that for all i, <code>P(f1 = i)</code> is between 0 and 1 (inclusive). Intuitively, then, minimising <code>P(f1 = f2)</code> is a matter of making the probability distribution of <code>f1</code> as even as possible. (This can be proven mathematically but it's not really important to the question.) Ideally, <code>P(f1 = i)</code> and <code>P(f1 = j)</code> should be the same for all <em>long</em>s <code>i</code> and <code>j</code>.</p> <p>Now let's look at some different possibilities for the nature of x and y.</p> <p>First the general case, where x and y are uniformly distributed over the range of a <em>long</em>. (In other words, x is equally likely to be anything a long can be. So is y.) In this case, we can let <code>f(x, y) = x+y</code>, or <code>f(x,y) = x-y</code>, or <code>f(x,y) = x XOR y</code>, or even <code>f(x,y) = x</code> and (assuming normal integer overflow) we find that we have a uniformly distributed f, too, which means these functions are all "optimal".</p> <p>But the <code>f(x,y) = x</code> example shows you that there's really not that much you can gain here.</p> <p>However, in practice, your x and y will probably <em>not</em> be uniformly distributed. For instance, if x and y both drawn randomly from the range [0, 9999], then using <code>f(x,y) = x + y * 10000</code> will <em>always</em> produce different output for different input.</p> <p>If, in each (x, y) pair, x and y are very likely to be near each other, for instance (1240,1249), (1,3), (-159720,-159721), then <code>f(x,y) = x</code> is actually quite a good candidate function.</p> <p>If x and y are "probably not huge", then you should combine the 16 low bits of x with the 16 low bits of y, i.e. <code>f(x,y) = ((x&0xFFFF) << 16) | (y&0xFFFF)</code>, because the lower bits will be more evenly distributed than the upper bits.</p> <p>This works very well if x and y are never negative. But if they are, the sign bit (that tells whether the number is positive or negative) might be more evenly distributed than some of the 16 low bits. So you may want to use it instead. E.g.</p> <pre><code>f(x,y) = ((x&0x7FFF) << 17) | ((y&0x7FFF) << 2) | ((x>>30) & 2) | (y>>31) </code></pre> <p>As the "probably not huge" case is quite common, I think this function would actually work quite well in general.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload