Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>This sounds like a job for suffix arrays/trees!</p> <p><a href="http://en.wikipedia.org/wiki/Suffix_array" rel="nofollow">http://en.wikipedia.org/wiki/Suffix_array</a></p> <p>You can use a suffix array built over your string to figure out patterns that repeat. For instance, we can build a suffix array over your example as follows (I'm using $ as always coming after every letter, you can sort it so that $ comes before every letter ... either way will work):</p> <pre> ABCADCADCADABC$ ABC$ ADABC$ ADCADABC$ ADCADCADABC$ BCADCADCADABC$ BC$ CADABC$ CADCADABC$ CADCADCADABC$ C$ DABC$ DCADABC$ DCADCADABC$ $ </pre> <p>From this, we can more easily see the common patterns in the string. Using the information in this suffix array representation, we can see that CAD is repeated 3x in a local area, and we'd likely use this as our choice for compression. ADC and DCA and so on are not as attractive because they compress less of the string.</p> <p><a href="http://en.wikipedia.org/wiki/Suffix_tree" rel="nofollow">http://en.wikipedia.org/wiki/Suffix_tree</a></p> <p>Suffix trees are more efficient ways of doing the same task. Once you wrap your head around how to do something using suffix arrays, it's not too far of a jump to go onto suffix trees. In fact, this is used in popular compression algorithms including LZW <a href="http://en.wikipedia.org/wiki/LZW" rel="nofollow">1</a> and BWT (Bzip) <a href="http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform" rel="nofollow">2</a>.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload