Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This is actually a harder problem than it sounds like, mainly because many mathematically-correct solutions will fail to actually allow you to hit all the possibilities (more on this below).</p> <p>First, here are some easy-to-implement, correct-if-you-have-a-truly-random-number generator:</p> <p>(0) Kyle's answer, which is O(n).</p> <p>(1) Generate a list of n pairs [(0, rand), (1, rand), (2, rand), ...], sort them by the second coordinate, and use the first k (for you, k=5) indices to get your random subset. I think this is easy to implement, although it is O(n log n) time.</p> <p>(2) Init an empty list s = [] that will grow to be the indices of k random elements. Choose a number r in {0, 1, 2, ..., n-1} at random, r = rand % n, and add this to s. Next take r = rand % (n-1) and stick in s; add to r the # elements less than it in s to avoid collisions. Next take r = rand % (n-2), and do the same thing, etc. until you have k distinct elements in s. This has worst-case running time O(k^2). So for k &lt;&lt; n, this can be faster. If you keep s sorted and track which contiguous intervals it has, you can implement it in O(k log k), but it's more work.</p> <p>@Kyle - you're right, on second thought I agree with your answer. I hastily read it at first, and mistakenly thought you were indicating to sequentially choose each element with fixed probability k/n, which would have been wrong - but your adaptive approach appears correct to me. Sorry about that.</p> <p>Ok, and now for the kicker: asymptotically (for fixed k, n growing), there are n^k/k! choices of k element subset out of n elements [this is an approximation of (n choose k)]. If n is large, and k is not very small, then these numbers are huge. The best cycle length you can hope for in any standard 32 bit random number generator is 2^32 = 256^4. So if we have a list of 1000 elements, and we want to choose 5 at random, there's no way a standard random number generator will hit all the possibilities. However, as long as you're ok with a choice that works fine for smaller sets, and always "looks" random, then these algorithms should be ok.</p> <p><strong>Addendum</strong>: After writing this, I realized that it's tricky to implement idea (2) correctly, so I wanted to clarify this answer. To get O(k log k) time, you need an array-like structure that supports O(log m) searches and inserts - a balanced binary tree can do this. Using such a structure to build up an array called s, here is some pseudopython:</p> <pre><code># Returns a container s with k distinct random numbers from {0, 1, ..., n-1} def ChooseRandomSubset(n, k): for i in range(k): r = UniformRandom(0, n-i) # May be 0, must be &lt; n-i q = s.FirstIndexSuchThat( s[q] - q &gt; r ) # This is the search. s.InsertInOrder(q ? r + q : r + len(s)) # Inserts right before q. return s </code></pre> <p>I suggest running through a few sample cases to see how this efficiently implements the above English explanation.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload