Note that there are some explanatory texts on larger screens.

plurals
  1. POMapping a list to a Huffman Tree whilst preserving relative order
    primarykey
    data
    text
    <p>I'm having an issue with a search algorithm over a Huffman tree: for a given probability distribution I need the Huffman tree to be identical regardless of permutations of the input data.</p> <p>Here is a picture of what's happening vs what I want: </p> <p><img src="https://i.stack.imgur.com/t4G7u.jpg" alt="Expectation vs reality"></p> <p>Basically I want to know if it's possible to preserve the relative order of the items from the list to the tree. If not, why is that so?</p> <p>For reference, I'm using the Huffman tree to generate sub groups according to a division of probability, so that I can run the search() procedure below. Notice that the data in the merge() sub-routine is combined, along with the weight. The codewords themselves aren't as important as the tree (which should preserve the relative order).</p> <p>For example if I generate the following Huffman codes:</p> <pre><code>probabilities = [0.30, 0.25, 0.20, 0.15, 0.10] items = ['a','b','c','d','e'] items = zip(items, probabilities) t = encode(items) d,l = hi.search(t) print(d) </code></pre> <p>Using the following Class:</p> <pre><code>class Node(object): left = None right = None weight = None data = None code = None def __init__(self, w,d): self.weight = w self.data = d def set_children(self, ln, rn): self.left = ln self.right = rn def __repr__(self): return "[%s,%s,(%s),(%s)]" %(self.data,self.code,self.left,self.right) def __cmp__(self, a): return cmp(self.weight, a.weight) def merge(self, other): total_freq = self.weight + other.weight new_data = self.data + other.data return Node(total_freq,new_data) def index(self, node): return node.weight def encode(symbfreq): pdb.set_trace() tree = [Node(sym,wt) for wt,sym in symbfreq] heapify(tree) while len(tree)&gt;1: lo, hi = heappop(tree), heappop(tree) n = lo.merge(hi) n.set_children(lo, hi) heappush(tree, n) tree = tree[0] def assign_code(node, code): if node is not None: node.code = code if isinstance(node, Node): assign_code(node.left, code+'0') assign_code(node.right, code+'1') assign_code(tree, '') return tree </code></pre> <p>I get: </p> <pre><code>'a'-&gt;11 'b'-&gt;01 'c'-&gt;00 'd'-&gt;101 'e'-&gt;100 </code></pre> <p>However, an assumption I've made in the search algorithm is that more probable items get pushed toward the left: that is I need 'a' to have the '00' codeword - and this should always be the case regardless of any permutation of the 'abcde' sequence. An example output is:</p> <pre><code>codewords = {'a':'00', 'b':'01', 'c':'10', 'd':'110', 'e':111'} </code></pre> <p>(N.b even though the codeword for 'c' is a suffix for 'd' this is ok). </p> <p>For completeness, here is the search algorithm:</p> <pre><code>def search(tree): print(tree) pdb.set_trace() current = tree.left other = tree.right loops = 0 while current: loops+=1 print(current) if current.data != 0 and current is not None and other is not None: previous = current current = current.left other = previous.right else: previous = other current = other.left other = other.right return previous, loops </code></pre> <p>It works by searching for the 'leftmost' 1 in a group of 0s and 1s - the Huffman tree has to put more probable items on the left. For example if I use the probabilities above and the input:</p> <pre><code>items = [1,0,1,0,0] </code></pre> <p>Then the index of the item returned by the algorithm is 2 - which isn't what should be returned (0 should, as it's leftmost).</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload