Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The hard part in my mind is how you build your static dictionary. You don't want to use the LZW dictionary built from your sample data. LZW wastes a bunch of time learning since it can't build the dictionary faster than the decompressor can (a token will only be used the second time it's seen by the compressor so the decompressor can add it to its dictionary the first time its seen). The flip side of this is that it's adding things to the dictionary that may never get used, just in case the string shows up again. (e.g., to have a token for 'stackoverflow' you'll also have entries for 'ac','ko','ve','rf' etc...)</p> <p>However, looking at the raw token stream from an LZ77 algorithm could work well. You'll only see tokens for strings seen at least twice. You can then build a list of the most common tokens/strings to include in your dictionary.</p> <p>Once you have a static dictionary, using LZW sans the dictionary update seems like an easy implementation but to get the best compression I'd consider a static <a href="http://en.wikipedia.org/wiki/Huffman_coding" rel="nofollow noreferrer">Huffman</a> table instead of the traditional 12 bit fixed size token (as George Phillips suggested). An LZW dictionary will burn tokens for all the sub-strings you may never actually encode (e.g, if you can encode 'stackoverflow', there will be tokens for 'st', 'sta', 'stac', 'stack', 'stacko' etc.).</p> <p>At this point it really isn't LZW - what makes LZW clever is how the decompressor can build the same dictionary the compressor used only seeing the compressed data stream. Something you won't be using. But all LZW implementations have a state where the dictionary is full and is no longer updated, this is how you'd use it with your static dictionary.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload