Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>If the list is sorted, for example, if you want to extract K elements out of N, but you do not care about their relative order, an efficient algorithm is proposed in the paper <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.1689&amp;rep=rep1&amp;type=pdf" rel="nofollow noreferrer"><em>An Efficient Algorithm for Sequential Random Sampling</em></a> (Jeffrey Scott Vitter, <em>ACM Transactions on Mathematical Software</em>, Vol. 13, No. 1, March 1987, Pages 56-67.).</p> <p><strong>edited</strong> to add the code in c++ using boost. I've just typed it and there might be many errors. The random numbers come from the boost library, with a stupid seed, so don't do anything serious with this.</p> <pre><code>/* Sampling according to [Vitter87]. * * Bibliography * [Vitter 87] * Jeffrey Scott Vitter, * An Efficient Algorithm for Sequential Random Sampling * ACM Transactions on MAthematical Software, 13 (1), 58 (1987). */ #include &lt;stdlib.h&gt; #include &lt;string.h&gt; #include &lt;math.h&gt; #include &lt;string&gt; #include &lt;iostream&gt; #include &lt;iomanip&gt; #include &lt;boost/random/linear_congruential.hpp&gt; #include &lt;boost/random/variate_generator.hpp&gt; #include &lt;boost/random/uniform_real.hpp&gt; using namespace std; // This is a typedef for a random number generator. // Try boost::mt19937 or boost::ecuyer1988 instead of boost::minstd_rand typedef boost::minstd_rand base_generator_type; // Define a random number generator and initialize it with a reproducible // seed. // (The seed is unsigned, otherwise the wrong overload may be selected // when using mt19937 as the base_generator_type.) base_generator_type generator(0xBB84u); //TODO : change the seed above ! // Defines the suitable uniform ditribution. boost::uniform_real&lt;&gt; uni_dist(0,1); boost::variate_generator&lt;base_generator_type&amp;, boost::uniform_real&lt;&gt; &gt; uni(generator, uni_dist); void SequentialSamplesMethodA(int K, int N) // Outputs K sorted random integers out of 0..N, taken according to // [Vitter87], method A. { int top=N-K, S, curr=0, currsample=-1; double Nreal=N, quot=1., V; while (K&gt;=2) { V=uni(); S=0; quot=top/Nreal; while (quot &gt; V) { S++; top--; Nreal--; quot *= top/Nreal; } currsample+=1+S; cout &lt;&lt; curr &lt;&lt; " : " &lt;&lt; currsample &lt;&lt; "\n"; Nreal--; K--;curr++; } // special case K=1 to avoid overflow S=floor(round(Nreal)*uni()); currsample+=1+S; cout &lt;&lt; curr &lt;&lt; " : " &lt;&lt; currsample &lt;&lt; "\n"; } void SequentialSamplesMethodD(int K, int N) // Outputs K sorted random integers out of 0..N, taken according to // [Vitter87], method D. { const int negalphainv=-13; //between -20 and -7 according to [Vitter87] //optimized for an implementation in 1987 !!! int curr=0, currsample=0; int threshold=-negalphainv*K; double Kreal=K, Kinv=1./Kreal, Nreal=N; double Vprime=exp(log(uni())*Kinv); int qu1=N+1-K; double qu1real=qu1; double Kmin1inv, X, U, negSreal, y1, y2, top, bottom; int S, limit; while ((K&gt;1)&amp;&amp;(threshold&lt;N)) { Kmin1inv=1./(Kreal-1.); while(1) {//Step D2: generate X and U while(1) { X=Nreal*(1-Vprime); S=floor(X); if (S&lt;qu1) {break;} Vprime=exp(log(uni())*Kinv); } U=uni(); negSreal=-S; //step D3: Accept ? y1=exp(log(U*Nreal/qu1real)*Kmin1inv); Vprime=y1*(1. - X/Nreal)*(qu1real/(negSreal+qu1real)); if (Vprime &lt;=1.) {break;} //Accept ! Test [Vitter87](2.8) is true //step D4 Accept ? y2=0; top=Nreal-1.; if (K-1 &gt; S) {bottom=Nreal-Kreal; limit=N-S;} else {bottom=Nreal+negSreal-1.; limit=qu1;} for(int t=N-1;t&gt;=limit;t--) {y2*=top/bottom;top--; bottom--;} if (Nreal/(Nreal-X)&gt;=y1*exp(log(y2)*Kmin1inv)) {//Accept ! Vprime=exp(log(uni())*Kmin1inv); break; } Vprime=exp(log(uni())*Kmin1inv); } // Step D5: Select the (S+1)th record currsample+=1+S; cout &lt;&lt; curr &lt;&lt; " : " &lt;&lt; currsample &lt;&lt; "\n"; curr++; N-=S+1; Nreal+=negSreal-1.; K-=1; Kreal-=1; Kinv=Kmin1inv; qu1-=S; qu1real+=negSreal; threshold+=negalphainv; } if (K&gt;1) {SequentialSamplesMethodA(K, N);} else { S=floor(N*Vprime); currsample+=1+S; cout &lt;&lt; curr &lt;&lt; " : " &lt;&lt; currsample &lt;&lt; "\n"; } } int main(void) { int Ntest=10000000, Ktest=Ntest/100; SequentialSamplesMethodD(Ktest,Ntest); return 0; } $ time ./sampling|tail </code></pre> <p>gives the following ouptut on my laptop</p> <pre><code>99990 : 9998882 99991 : 9998885 99992 : 9999021 99993 : 9999058 99994 : 9999339 99995 : 9999359 99996 : 9999411 99997 : 9999427 99998 : 9999584 99999 : 9999745 real 0m0.075s user 0m0.060s sys 0m0.000s </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload