Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Weights define a probability distribution function (pdf). Random numbers from any such pdf can be generated by <a href="http://en.wikipedia.org/wiki/Pseudorandom_number_generator#Non-uniform_generators" rel="nofollow noreferrer">applying its associated inverse cumulative distribution function</a> to uniform random numbers between 0 and 1.</p> <p>See also this <a href="https://stackoverflow.com/questions/977354/generating-non-uniform-random-numbers/977508#977508">SO explanation</a>, or, as explained by <a href="http://en.wikipedia.org/wiki/Cumulative%5Fdistribution%5Ffunction#Inverse" rel="nofollow noreferrer">Wikipedia</a>:</p> <blockquote> <p>If Y has a U[0,1] distribution then F⁻¹(Y) is distributed as F. This is used in random number generation using the inverse transform sampling-method.</p> </blockquote> <pre><code>import random import bisect import collections def cdf(weights): total = sum(weights) result = [] cumsum = 0 for w in weights: cumsum += w result.append(cumsum / total) return result def choice(population, weights): assert len(population) == len(weights) cdf_vals = cdf(weights) x = random.random() idx = bisect.bisect(cdf_vals, x) return population[idx] weights=[0.3, 0.4, 0.3] population = 'ABC' counts = collections.defaultdict(int) for i in range(10000): counts[choice(population, weights)] += 1 print(counts) # % test.py # defaultdict(&lt;type 'int'&gt;, {'A': 3066, 'C': 2964, 'B': 3970}) </code></pre> <p>The <code>choice</code> function above uses <code>bisect.bisect</code>, so selection of a weighted random variable is done in <code>O(log n)</code> where <code>n</code> is the length of <code>weights</code>. </p> <hr> <p>Note that as of version 1.7.0, NumPy has a Cythonized <a href="http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.random.choice.html" rel="nofollow noreferrer">np.random.choice function</a>. For example, this generates 1000 samples from the population <code>[0,1,2,3]</code> with weights <code>[0.1, 0.2, 0.3, 0.4]</code>:</p> <pre><code>import numpy as np np.random.choice(4, 1000, p=[0.1, 0.2, 0.3, 0.4]) </code></pre> <p><code>np.random.choice</code> also has a <code>replace</code> parameter for sampling with or without replacement.</p> <hr> <p>A theoretically better algorithm is the <a href="http://en.wikipedia.org/wiki/Alias_method" rel="nofollow noreferrer">Alias Method</a>. It builds a table which requires <code>O(n)</code> time, but after that, samples can be drawn in <code>O(1)</code> time. So, if you need to draw many samples, in theory the Alias Method may be faster. There is a Python implementation of the Walker Alias Method <a href="http://code.activestate.com/recipes/576564-walkers-alias-method-for-random-objects-with-diffe/" rel="nofollow noreferrer">here</a>, and a <a href="https://gist.github.com/ntamas/1109133" rel="nofollow noreferrer">numpy version here</a>.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload