Note that there are some explanatory texts on larger screens.

plurals
  1. POconvert huffman code string to binary
    text
    copied!<p>I am having problem with how to convert huffman encoding string to binary python.</p> <p>This question involves nothing of the huffman algorithm.</p> <p>It is like this:</p> <p>I can get an encoded huffman string, say <code>01010101010</code>. <strong>Note</strong>, it is a string.</p> <p>But now I want to save the string representation into real binary.</p> <p>In the huffman encoded string, every 0 and 1 is a <strong>byte</strong>.</p> <p>What I want is every 0 and 1 is a <strong>bit</strong>.</p> <p>How can I do that in python?</p> <p><strong>Edit 1:</strong></p> <p>Please forgive I did not describe my problem clear enough.</p> <p>Let me explain my current approach of writing to zeros and ones to binary.</p> <p>Say, we can a code string s='010101010'.</p> <ol> <li>I use <code>int</code> to convert it to integer</li> <li>Then use <code>unichr</code> to convert it to string so that I can write it to file</li> <li>write the string to file in binary mode</li> </ol> <p>Also to be noted, I need to read the file in order to decode the huffman code.</p> <p>So my approach is,</p> <ol> <li>read the bytes from file</li> <li>restore them to int</li> <li>convert the int to their binary representation string.</li> <li>decode the string</li> </ol> <p>And at <strong>step 2</strong>, the problem happens and I became clueless.</p> <p>As some huffman string can be short(like, <code>10</code>), while some can be long(<code>010101010101001</code>). This results in their different byte length in their int value( some short string may take just <strong>one</strong> byte,while long ones can take <strong>two</strong> or even more )</p> <p>The following code illustrates my problem:</p> <pre><code>ss=['010101','10010101010'] # first one is short and takes only one byte in its int value # second one is long and takes two bytes print 'write it to file' with open('binary.bin','wb') as f: for s in ss: n=int(s,2) print n s=unichr(n) f.write(s) print 'read it to file' with open('binary.bin','rb') as f: for s in f.read(): print ord(s) </code></pre> <p>I am reading one byte a time in the second <em>with</em> part, but this is actually not correct. Because string <code>10010101010</code> takes up two bytes.</p> <p>So, when I read those bytes from the file, How many bytes should I read at once?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload