StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POProbability distribution in Python
primarykey
Id
526255
data
AcceptedAnswerId
526300
AnswerCount
12
ClosedDate
CommentCount
12
CommunityOwnedDate
2009-02-09T01:00:32.423
CreationDate
2009-02-08T19:52:22.713
FavoriteCount
22
LastActivityDate
2014-01-19T00:25:27.990
LastEditDate
2009-02-10T15:56:55.267
LastEditorUserId
49985
OwnerUserId
49985
ParentId
0
PostTypeId
1
Score
20
ViewCount
18490
LastEditorDisplayName
Nicholas Leonard
text
Body
I have a bunch of keys that each have an unlikeliness variable. I want to randomly choose one of these keys, yet I want it to be more unlikely for unlikely (key, values) to be chosen than a less unlikely (a more likely) object. I am wondering if you would have any suggestions, preferably an existing python module that I could use, else I will need to make it myself. I have checked out the random module; it does not seem to provide this. I have to make such choices many millions of times for 1000 different sets of objects each containing 2,455 objects. Each set will exchange objects among each other so the random chooser needs to be dynamic. With 1000 sets of 2,433 objects, that is 2,433 million objects; low memory consumption is crucial. And since these choices are not the bulk of the algorithm, I need this process to be quite fast; CPU-time is limited. Thx Update: Ok, I tried to consider your suggestions wisely, but time is so limited... I looked at the binary search tree approach and it seems too risky (complex and complicated). The other suggestions all resemble the ActiveState recipe. I took it and modified it a little in the hope of making more efficient: <pre><code>def windex(dict, sum, max): '''an attempt to make a random.choose() function that makes weighted choices accepts a dictionary with the item_key and certainty_value as a pair like: >>> x = [('one', 20), ('two', 2), ('three', 50)], the maximum certainty value (max) and the sum of all certainties.''' n = random.uniform(0, 1) sum = max*len(list)-sum for key, certainty in dict.iteritems(): weight = float(max-certainty)/sum if n < weight: break n = n - weight return key </code></pre> I am hoping to get an efficiency gain from dynamically maintaining the sum of certainties and the maximum certainty. Any further suggestions are welcome. You guys saves me so much time and effort, while increasing my effectiveness, it is crazy. Thx! Thx! Thx! Update2: I decided to make it more efficient by letting it choose more choices at once. This will result in an acceptable loss in precision in my algo for it is dynamic in nature. Anyway, here is what I have now: <pre><code>def weightedChoices(dict, sum, max, choices=10): '''an attempt to make a random.choose() function that makes weighted choices accepts a dictionary with the item_key and certainty_value as a pair like: >>> x = [('one', 20), ('two', 2), ('three', 50)], the maximum certainty value (max) and the sum of all certainties.''' list = [random.uniform(0, 1) for i in range(choices)] (n, list) = relavate(list.sort()) keys = [] sum = max*len(list)-sum for key, certainty in dict.iteritems(): weight = float(max-certainty)/sum if n < weight: keys.append(key) if list: (n, list) = relavate(list) else: break n = n - weight return keys def relavate(list): min = list[0] new = [l - min for l in list[1:]] return (min, new) </code></pre> I haven't tried it out yet. If you have any comments/suggestions, please do not hesitate. Thx! Update3: I have been working all day on a task-tailored version of Rex Logan's answer. Instead of a 2 arrays of objects and weights, it is actually a special dictionary class; which makes things quite complex since Rex's code generates a random index... I also coded a test case that kind of resembles what will happen in my algo (but I can't really know until I try!). The basic principle is: the more a key is randomly generated often, the more unlikely it will be generated again: <pre><code>import random, time import psyco psyco.full() class ProbDict(): """ Modified version of Rex Logans RandomObject class. The more a key is randomly chosen, the more unlikely it will further be randomly chosen. """ def __init__(self,keys_weights_values={}): self._kw=keys_weights_values self._keys=self._kw.keys() self._len=len(self._keys) self._findSeniors() self._effort = 0.15 self._fails = 0 def __iter__(self): return self.next() def __getitem__(self, key): return self._kw[key] def __setitem__(self, key, value): self.append(key, value) def __len__(self): return self._len def next(self): key=self._key() while key: yield key key = self._key() def __contains__(self, key): return key in self._kw def items(self): return self._kw.items() def pop(self, key): try: (w, value) = self._kw.pop(key) self._len -=1 if w == self._seniorW: self._seniors -= 1 if not self._seniors: #costly but unlikely: self._findSeniors() return [w, value] except KeyError: return None def popitem(self): return self.pop(self._key()) def values(self): values = [] for key in self._keys: try: values.append(self._kw[key][1]) except KeyError: pass return values def weights(self): weights = [] for key in self._keys: try: weights.append(self._kw[key][0]) except KeyError: pass return weights def keys(self, imperfect=False): if imperfect: return self._keys return self._kw.keys() def append(self, key, value=None): if key not in self._kw: self._len +=1 self._kw[key] = [0, value] self._keys.append(key) else: self._kw[key][1]=value def _key(self): for i in range(int(self._effort*self._len)): ri=random.randint(0,self._len-1) #choose a random object rx=random.uniform(0,self._seniorW) rkey = self._keys[ri] try: w = self._kw[rkey][0] if rx >= w: # test to see if that is the value we want w += 1 self._warnSeniors(w) self._kw[rkey][0] = w return rkey except KeyError: self._keys.pop(ri) # if you do not find one after 100 tries then just get a random one self._fails += 1 #for confirming effectiveness only for key in self._keys: if key in self._kw: w = self._kw[key][0] + 1 self._warnSeniors(w) self._kw[key][0] = w return key return None def _findSeniors(self): '''this function finds the seniors, counts them and assess their age. It is costly but unlikely.''' seniorW = 0 seniors = 0 for w in self._kw.itervalues(): if w >= seniorW: if w == seniorW: seniors += 1 else: seniorsW = w seniors = 1 self._seniors = seniors self._seniorW = seniorW def _warnSeniors(self, w): #a weight can only be incremented...good if w >= self._seniorW: if w == self._seniorW: self._seniors+=1 else: self._seniors = 1 self._seniorW = w def test(): #test code iterations = 200000 size = 2500 nextkey = size pd = ProbDict(dict([(i,[0,i]) for i in xrange(size)])) start = time.clock() for i in xrange(iterations): key=pd._key() w=pd[key][0] if random.randint(0,1+pd._seniorW-w): #the heavier the object, the more unlikely it will be removed pd.pop(key) probAppend = float(500+(size-len(pd)))/1000 if random.uniform(0,1) < probAppend: nextkey+=1 pd.append(nextkey) print (time.clock()-start)*1000/iterations, "msecs / iteration with", pd._fails, "failures /", iterations, "iterations" weights = pd.weights() weights.sort() print "avg weight:", float(sum(weights))/pd._len, max(weights), pd._seniorW, pd._seniors, len(pd), len(weights) print weights test() </code></pre> Any comments are still welcome. @Darius: your binary trees are too complex and complicated for me; and I do not think its leafs can be removed efficiently... Thx all
Tags
<python><algorithm><random><distribution><probability>
Title
Probability distribution in Python
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USNicholas Leonard
UserOwnerUserId
1. USNicholas Leonard
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
2. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
3. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POProbability distribution in Python
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POProbability distribution in Python
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POProbability distribution in Python
 UserUserId
 USmonkut
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.