StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POHow to advance past a deflate byte sequence contained in a byte stream?
text
Body
copied!<p>I have a byte stream that is a concatenation of sections, where each section is composed of a header plus a deflated byte stream. </p> <p>I need to split this byte stream sections but the header only contains information about the data in uncompressed form, no hint about the compressed data length so I can advance properly in the stream and parse the next section.</p> <p>So far the only way I found to advance past the deflated byte sequece is to parse it according to the <a href="http://www.ietf.org/rfc/rfc1951.txt" rel="nofollow">this specification</a>. From what I understood by reading the specification, a deflate stream is composed of blocks, which can be compressed blocks or literal blocks.</p> <p>Literal blocks contain a size header which can be used to easily advance past it.</p> <p>Compressed blocks are composed with 'prefix codes', which are bit sequences of variable length that have special meanings to the deflate algorithm. Since I'm only interested in finding out the deflated stream length, I guess the only code I need to look for is '0000000' which according to the specification signals the end of block.</p> <p>So I came up with this coffeescript function to parse the deflate stream(I'm working on node.js)</p> <pre><code># The job of this function is to return the position # after the deflate stream contained in 'buffer'. The # deflated stream begins at 'pos'. advanceDeflateStream = (buffer, pos) -> byteOffset = 0 finalBlock = false while 1 if byteOffset == 6 firstTypeBit = 0b00000001 & buffer[pos] pos++ secondTypeBit = 0b10000000 & buffer[pos] type = firstTypeBit | (secondTypeBit << 1) else if byteOffset == 7 pos++ type = buffer[pos] & (0b01100000 >>> byteOffset) if type == 0 # Literal block # ignore the remaining bits and advance position byteOffset = 0 pos++ len = buffer.readUInt16LE(pos) pos += 2 lenComplement = buffer.readUInt16LE(pos) if (len ^ ~lenComplement) throw new Error('Literal block lengh check fail') pos += (2 + len) # Advance past literal block else if type in [1, 2] # huffman block # we are only interested in finding the 'block end' marker # which is signaled by the bit string 0000000 (256) eob = false matchedZeros = 0 while !eob byte = buffer[pos] for i in [byteOffset..7] # loop the remaining bits looking for 7 consecutive zeros if (byte ^ (0b10000000 >>> byteOffset)) >>> (7 - byteOffset) matchedZeros++ else # reset counter matchedZeros = 0 if matchedZeros == 7 eob = true break byteOffset++ if !eob byteOffset = 0 pos++ else throw new Error('Invalid deflate block') finalBlock = buffer[pos] & (0b10000000 >>> byteOffset) if finalBlock break return pos </code></pre> <p>To check if this works, I wrote a simple mocha test case:</p> <pre><code>zlib = require 'zlib' test 'sample deflate stream', (done) -> data = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' # length 30 zlib.deflate data, (err, deflated) -> # deflated.length == 11 advanceDeflateStream(deflated, 0).shoudl.eql(11) done() </code></pre> <p>The problem is that this test fails and I do not know how to debug it. I accept any answer that points what I missed in the parsing algorithm or contains a correct version of the above function in any language.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload