StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>You can keep a priority queue of the 100 biggest numbers, iterate through the billion numbers, whenever you encounter a number greater than the smallest number in the queue (the head of the queue), remove the head of the queue and add the new number to the queue.</p> <p><strong>EDIT:</strong> as Dev noted, with a priority queue implemented with a heap, the complexity of insertion to queue is <code>O(logN)</code></p> <p>In the worst case you get <code>billion<em>log<sub>2</sub>(100)</code> which is better than <code>billion</em>log<sub>2</sub>(billion)</code></p> <p>In general, if you need the largest K numbers from a set of N numbers, the complexity is <code>O(NlogK)</code> rather than <code>O(NlogN)</code>, this can be very significant when K is very small comparing to N.</p> <p><strong>EDIT2:</strong></p> <p>The expected time of this algorithm is pretty interesting, since in each iteration an insertion may or may not occur. The probability of the i'th number to be inserted to the queue is the probability of a random variable being larger than at least <code>i-K</code> random variables from the same distribution (the first k numbers are automatically added to the queue). We can use order statistics (see <a href="http://en.wikipedia.org/wiki/Uniform_distribution_%28continuous%29#Order_statistics" rel="noreferrer">link</a>) to calculate this probability. For example, lets assume the numbers were randomly selected uniformly from <code>{0, 1}</code>, the expected value of (i-K)th number (out of i numbers) is <code>(i-k)/i</code>, and chance of a random variable being larger than this value is <code>1-[(i-k)/i] = k/i</code>. </p> <p>Thus, the expected number of insertions is:</p> <p><img src="https://i.stack.imgur.com/6Vgjk.jpg" alt="enter image description here"></p> <p>And the expected running time can be expressed as:</p> <p><img src="https://i.stack.imgur.com/Mg6J1.jpg" alt="enter image description here"></p> <p>(<code>k</code> time to generate the queue with the first <code>k</code> elements, then <code>n-k</code> comparisons, and the expected number of insertions as described above, each takes an average <code>log(k)/2</code> time)</p> <p>Note that when <code>N</code> is very large comparing to <code>K</code>, this expression is a lot closer to <code>n</code> rather than <code>NlogK</code>. This is somewhat intuitive, as in the case of the question, even after 10000 iterations (which is very small comparing to a billion), the chance of a number to be inserted to the queue is very small.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload