StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Stream compression is different from file compression. When compressing a file, it's generally possible to make multiple passes over the entire file and determine which compression scheme to use before having to commit to one. When compressing a stream, it's often necessary to start outputting data before the compression routine has processed enough data to know what compression method is going to be optimal.</p> <p>This effect can be somewhat mitigated by dividing data into blocks, deciding for each block how to represent the data, and including a header at the start of each block identifying how it is stored. Unfortunately, the extra block headers will add to the size of the resulting stream. Further, many compression schemes improve in effectiveness as they process a stream; it may well be that every 1k block in a file would expand if "compressed" individually, even if compressing the whole file would result in a considerable space savings (since the compresser could e.g. build up a dictionary of common byte sequences). It would be possible to design a compress/uncompress pair so that a block of data which would expand would be written out verbatim by the compresser (with a header byte indicating that's what it was), and have the uncompresser process that block the same way the compresser could have done, so as to add to the dictionary the same byte sequences that would have been added had the block been stored in "compressed" form. Such an approach would probably be a good one, though it would add considerably to the complexity of the uncompresser.</p> <p>I suspect the biggest problem for DeflateStream, though, is that there may not be any way to improve the worst-case "compression" performance without producing compressed data that is incompatible with the existing "uncompress" code. Suppose one has a string of bytes Q, and one needs a sequence of bytes which, when fed to the "uncompress" code that shipped with .net 2.0, will yield that same sequence. It may well be that for some possible values of Q, there are no such input sequences which aren't a lot bigger than Q. If that's the case, there's no way Microsoft could "fix" the problem without a time machine.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload