Note that there are some explanatory texts on larger screens.

plurals
  1. POProbability distribution in Python
    primarykey
    data
    text
    <p>I have a bunch of keys that each have an unlikeliness variable. I want to randomly choose one of these keys, yet I want it to be more unlikely for unlikely (key, values) to be chosen than a less unlikely (a more likely) object. I am wondering if you would have any suggestions, preferably an existing python module that I could use, else I will need to make it myself.</p> <p>I have checked out the random module; it does not seem to provide this.</p> <p>I have to make such choices many millions of times for 1000 different sets of objects each containing 2,455 objects. Each set will exchange objects among each other so the random chooser needs to be dynamic. With 1000 sets of 2,433 objects, that is 2,433 million objects; low memory consumption is crucial. And since these choices are not the bulk of the algorithm, I need this process to be quite fast; CPU-time is limited.</p> <p>Thx </p> <p>Update:</p> <p>Ok, I tried to consider your suggestions wisely, but time is so limited... </p> <p>I looked at the binary search tree approach and it seems too risky (complex and complicated). The other suggestions all resemble the ActiveState recipe. I took it and modified it a little in the hope of making more efficient:</p> <pre><code>def windex(dict, sum, max): '''an attempt to make a random.choose() function that makes weighted choices accepts a dictionary with the item_key and certainty_value as a pair like: &gt;&gt;&gt; x = [('one', 20), ('two', 2), ('three', 50)], the maximum certainty value (max) and the sum of all certainties.''' n = random.uniform(0, 1) sum = max*len(list)-sum for key, certainty in dict.iteritems(): weight = float(max-certainty)/sum if n &lt; weight: break n = n - weight return key </code></pre> <p>I am hoping to get an efficiency gain from dynamically maintaining the sum of certainties and the maximum certainty. Any further suggestions are welcome. You guys saves me so much time and effort, while increasing my effectiveness, it is crazy. Thx! Thx! Thx!</p> <p>Update2:</p> <p>I decided to make it more efficient by letting it choose more choices at once. This will result in an acceptable loss in precision in my algo for it is dynamic in nature. Anyway, here is what I have now:</p> <pre><code>def weightedChoices(dict, sum, max, choices=10): '''an attempt to make a random.choose() function that makes weighted choices accepts a dictionary with the item_key and certainty_value as a pair like: &gt;&gt;&gt; x = [('one', 20), ('two', 2), ('three', 50)], the maximum certainty value (max) and the sum of all certainties.''' list = [random.uniform(0, 1) for i in range(choices)] (n, list) = relavate(list.sort()) keys = [] sum = max*len(list)-sum for key, certainty in dict.iteritems(): weight = float(max-certainty)/sum if n &lt; weight: keys.append(key) if list: (n, list) = relavate(list) else: break n = n - weight return keys def relavate(list): min = list[0] new = [l - min for l in list[1:]] return (min, new) </code></pre> <p>I haven't tried it out yet. If you have any comments/suggestions, please do not hesitate. Thx!</p> <p>Update3:</p> <p>I have been working all day on a task-tailored version of Rex Logan's answer. Instead of a 2 arrays of objects and weights, it is actually a special dictionary class; which makes things quite complex since Rex's code generates a random index... I also coded a test case that kind of resembles what will happen in my algo (but I can't really know until I try!). The basic principle is: the more a key is randomly generated often, the more unlikely it will be generated again: </p> <pre><code>import random, time import psyco psyco.full() class ProbDict(): """ Modified version of Rex Logans RandomObject class. The more a key is randomly chosen, the more unlikely it will further be randomly chosen. """ def __init__(self,keys_weights_values={}): self._kw=keys_weights_values self._keys=self._kw.keys() self._len=len(self._keys) self._findSeniors() self._effort = 0.15 self._fails = 0 def __iter__(self): return self.next() def __getitem__(self, key): return self._kw[key] def __setitem__(self, key, value): self.append(key, value) def __len__(self): return self._len def next(self): key=self._key() while key: yield key key = self._key() def __contains__(self, key): return key in self._kw def items(self): return self._kw.items() def pop(self, key): try: (w, value) = self._kw.pop(key) self._len -=1 if w == self._seniorW: self._seniors -= 1 if not self._seniors: #costly but unlikely: self._findSeniors() return [w, value] except KeyError: return None def popitem(self): return self.pop(self._key()) def values(self): values = [] for key in self._keys: try: values.append(self._kw[key][1]) except KeyError: pass return values def weights(self): weights = [] for key in self._keys: try: weights.append(self._kw[key][0]) except KeyError: pass return weights def keys(self, imperfect=False): if imperfect: return self._keys return self._kw.keys() def append(self, key, value=None): if key not in self._kw: self._len +=1 self._kw[key] = [0, value] self._keys.append(key) else: self._kw[key][1]=value def _key(self): for i in range(int(self._effort*self._len)): ri=random.randint(0,self._len-1) #choose a random object rx=random.uniform(0,self._seniorW) rkey = self._keys[ri] try: w = self._kw[rkey][0] if rx &gt;= w: # test to see if that is the value we want w += 1 self._warnSeniors(w) self._kw[rkey][0] = w return rkey except KeyError: self._keys.pop(ri) # if you do not find one after 100 tries then just get a random one self._fails += 1 #for confirming effectiveness only for key in self._keys: if key in self._kw: w = self._kw[key][0] + 1 self._warnSeniors(w) self._kw[key][0] = w return key return None def _findSeniors(self): '''this function finds the seniors, counts them and assess their age. It is costly but unlikely.''' seniorW = 0 seniors = 0 for w in self._kw.itervalues(): if w &gt;= seniorW: if w == seniorW: seniors += 1 else: seniorsW = w seniors = 1 self._seniors = seniors self._seniorW = seniorW def _warnSeniors(self, w): #a weight can only be incremented...good if w &gt;= self._seniorW: if w == self._seniorW: self._seniors+=1 else: self._seniors = 1 self._seniorW = w def test(): #test code iterations = 200000 size = 2500 nextkey = size pd = ProbDict(dict([(i,[0,i]) for i in xrange(size)])) start = time.clock() for i in xrange(iterations): key=pd._key() w=pd[key][0] if random.randint(0,1+pd._seniorW-w): #the heavier the object, the more unlikely it will be removed pd.pop(key) probAppend = float(500+(size-len(pd)))/1000 if random.uniform(0,1) &lt; probAppend: nextkey+=1 pd.append(nextkey) print (time.clock()-start)*1000/iterations, "msecs / iteration with", pd._fails, "failures /", iterations, "iterations" weights = pd.weights() weights.sort() print "avg weight:", float(sum(weights))/pd._len, max(weights), pd._seniorW, pd._seniors, len(pd), len(weights) print weights test() </code></pre> <p>Any comments are still welcome. @Darius: your binary trees are too complex and complicated for me; and I do not think its leafs can be removed efficiently... Thx all</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload