StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>I can think of three basic methods, one of which involves frequent reguessing and one of which involves keeping extra information. I think that doing one or the other of these things is unavoidable. I'm going to begin with the extra information one:</p> <p>In each node, store a number <code>count</code> which represents the number of descendants it has. For every node, you'll need to have a number between 1 and <code>count</code> for that node to tell you whether to go left or right by comparing it to the left child's count. Here's the algorithm:</p> <pre><code>n := random integer between 1 and root.count node := route while node.count != 1 if n <= node.left.count node = node.left else node = node.right n = n - node.left.count </code></pre> <p>So, essentially, we're imposing a left-to-right ordering on all nodes and selecting the nth one from the left. This is fairly quick, only having a O(depth of tree), which is likely the best we can do without doing something like also building a vector which contains all the node labels. This also adds an overhead of O(depth of tree) to any changes to the tree since counts must be corrected. If you're going the other way and never changing the tree at all but going to be selecting random nodes a lot, just bit the bullet and put all of the node labels in a vector. That way you can select a random one in O(1) after O(N) initial set-up time.</p> <p>If, however, you don't want to use up any storage space, here's an alternative with a lot of reguessing. First find a bound (which I'll label B) for the depth of the tree (we can use N-1 if needed, but obviously, that's a very loose bound. The tighter the bound which can be found, the faster the algorithm runs). Next we're going to generate a possible node label in a random, but even way. There are 2^(B+1)-1 possibilities. It's not just 2^B because, for example, the string "0011" and "11" are completely different strings. As a result, we need to count all possible binary strings of length between 0 and B. Obviously, we have 2^i strings of length i. So for strings of length i or less, we have sum(i=0 to B){2^i} = 2^(B+1)-1. So, we can just chose a number between 0 and 2^(B+1)-2 and then find the corresponding node label. Of course, the mapping from numbers to node labels isn't trivial, so I'll provide it here.</p> <p>We convert the number we have chosen into a string of bits in the ordinary way. Then, reading from the left, if the first digit is a 0, then the node label is the remaining string to the right (possibly the empty string, which is a valid node label although not likely to be in use). If the first digit is a 1, then we throw it away and repeat this process. Thus, if B=4, then the node label "0001" would come from the number "00001". The node label "001" would come from the number "10001". The node label "01" would come from the number "11001". The node label "1" would come from the number "11101". And the node label "" would come from the number "11110". We did not include the number 2^(B+1)-1 ("11111" in this case) which has no valid interpretation under this scheme. I'll leave it as an exercise to the reader to prove to themselves that every string from length 0 to B can be represented under this scheme. Rather than trying to prove it, I'll just assert that it will work.</p> <p>So now we have a node label. The next step is to see if that label exists by traversing the tree. If it does, we're done. If it doesn't, then choose a new number and start over (that's the reguessing part). It's likely to have to reguess a lot, since only a small fraction of legal node labels will be in use, but this won't skew the fairness, just increase the time.</p> <p>Here's a pseudo-code version of this process in four functions:</p> <pre><code>function num_to_binary_list(n,digits) = if digits == 0 return () if n mod 2 == 0 return 0 :: num_to_digits(n/2,digits-1) else return 1 :: num_to_digits((n-1)/2,digits-1) function binary_list_to_node_label_list(l) = if l.head() == 0 return l.tail() else return binary_list_to_node_label_list(l.tail()) function check_node_label_list_against_tree(str,node) = if node == null return false,null if str.isEmpty() if node.isLeaf() return true,node else return false,null if str.head() == 0 return check_node_label_list_against_tree(str.tail(),node.left) else check_node_label_list_against_tree(str.tail,node.right) function generate_random_node tree b = found := false while (not found) x := random(0,2**(b+1)-2) // We're assuming that this random selects inclusively node_label := binary_list_to_node_label(num_to_binary_list(x,b+1)) found,node := check_node_label_list_against_tree(node_label,tree) return node </code></pre> <p>The timing analysis for this, of course, is pretty horrendous. Basically, the while loop will run an average of (2^(B+1)-1)/N times. So, in the worst case, it's O((2^N)/N) which is terrible. In the best case, B would be on the order of log(N), so it would be roughly O(1), but that requires that the tree be fairly balanced which it may not be. Still, if you really want no extra space, this method does that.</p> <p>I don't really think that you can do better than this last method without storing some information. It sounds appealing to be able to traverse the tree, making random decisions as you go, but without storing additional information about the structure, you're just not going to be able to do that. Every time you make a branching decision, you could have just one node on the left side and a million nodes on the right side or it could have a million nodes on the left side and just one on the right side. Because those are both possible and you don't know which is the case, there's simply no way to make an even random decision between the two sides. Obviously 50-50 doesn't work and any other choice is going to be similarly problematic.</p> <p>So, if you don't want extra space, the second method will work, but be slow. If you don't mind adding some extra space, the first method will work and be fast. And, as I said earlier, if you're not going to be changing the tree and you'll be selecting a lot of random nodes, then bite the bullet and just traverse the tree and stick all leaf nodes in a self-growing array or vector and then pick from that.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload