StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POEfficient algorithm to randomly select items with frequency
primarykey
Id
872563
data
AcceptedAnswerId
873430
AnswerCount
3
ClosedDate
CommentCount
5
CommunityOwnedDate
CreationDate
2009-05-16T14:48:41.430
FavoriteCount
3
LastActivityDate
2015-01-16T12:56:38.060
LastEditDate
2009-05-18T03:47:14.483
LastEditorUserId
105597
OwnerUserId
9859
ParentId
0
PostTypeId
1
Score
10
ViewCount
1545
LastEditorDisplayName
text
Body
Given an array of <code>n</code> word-frequency pairs: <pre>[ (w0, f0), (w1, f1), ..., (wn-1, fn-1) ]</pre> where <code>wi</code> is a word, <code>fi</code> is an integer frequencey, and the sum of the frequencies <code>∑fi = m</code>, I want to use a pseudo-random number generator (pRNG) to select <code>p</code> words <code>wj0, wj1, ..., wjp-1</code> such that the probability of selecting any word is proportional to its frequency: <pre>P(wi = wjk) = P(i = jk) = fi / m</pre> (Note, this is selection with replacement, so the same word could be chosen every time). I've come up with three algorithms so far: <ol> <li>Create an array of size <code>m</code>, and populate it so the first <code>f0</code> entries are <code>w0</code>, the next <code>f1</code> entries are <code>w1</code>, and so on, so the last <code>fp-1</code> entries are <code>wp-1</code>.<pre>[ w0, ..., w0, w1,..., w1, ..., wp-1, ..., wp-1 ]</pre> Then use the pRNG to select <code>p</code> indices in the range <code>0...m-1</code>, and report the words stored at those indices. This takes <code>O(n + m + p)</code> work, which isn't great, since <code>m</code> can be much much larger than n.</li> <li>Step through the input array once, computing<pre>mi = ∑h≤ifh = mi-1 + fi</pre> and after computing <code>mi</code>, use the pRNG to generate a number <code>xk</code> in the range <code>0...mi-1</code> for each <code>k</code> in <code>0...p-1</code> and select <code>wi</code> for <code>wjk</code> (possibly replacing the current value of <code>wjk</code>) if <code>xk < fi</code>. This requires <code>O(n + np)</code> work.</li> <li>Compute <code>mi</code> as in algorithm 2, and generate the following array on n word-frequency-partial-sum triples:<pre>[ (w0, f0, m0), (w1, f1, m1), ..., (wn-1, fn-1, mn-1) ]</pre> and then, for each k in <code>0...p-1</code>, use the pRNG to generate a number <code>xk</code> in the range <code>0...m-1</code> then do binary search on the array of triples to find the <code>i</code> s.t. <code>mi-fi ≤ xk < mi</code>, and select <code>wi</code> for <code>wjk</code>. This requires <code>O(n + p log n)</code> work.</li> </ol> My question is: Is there a more efficient algorithm I can use for this, or are these as good as it gets?
Tags
<algorithm><random><big-o>
Title
Efficient algorithm to randomly select items with frequency
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USkcwu
UserOwnerUserId
1. USrampion
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
2. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POEfficient algorithm to randomly select items with frequency
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POEfficient algorithm to randomly select items with frequency
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POEfficient algorithm to randomly select items with frequency
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.