Note that there are some explanatory texts on larger screens.

plurals
  1. POCompression formats with good support for random access within archives?
    primarykey
    data
    text
    <p>This is similar to a <a href="https://stackoverflow.com/questions/236414/">previous question</a>, but the answers there don't satisfy my needs and my question is slightly different:</p> <p>I currently use gzip compression for some very large files which contain sorted data. When the files are not compressed, binary search is a handy and efficient way to support seeking to a location in the sorted data.</p> <p>But when the files are compressed, things get tricky. I recently found out about <a href="http://www.zlib.net/" rel="noreferrer">zlib</a>'s <code>Z_FULL_FLUSH</code> option, which can be used during compression to insert "sync points" in the compressed output (<code>inflateSync()</code> can then begin reading from various points in the file). This is OK, though files I already have would have to be recompressed to add this feature (and strangely <code>gzip</code> doesn't have an option for this, but I'm willing to write my own compression program if I must).</p> <p>It seems from <a href="http://newsgroups.derkeiler.com/Archive/Comp/comp.compression/2006-02/msg00327.html" rel="noreferrer">one source</a> that even <code>Z_FULL_FLUSH</code> is not a perfect solution...not only is it not supported by all gzip archives, but the very idea of detecting sync points in archives may produce false positives (either by coincidence with the magic number for sync points, or due to the fact that <code>Z_SYNC_FLUSH</code> also produces sync points but they are not usable for random access).</p> <p>Is there a better solution? I'd like to avoid having auxiliary files for indexing if possible, and explicit, default support for quasi-random access would be helpful (even if it's large-grained--like being able to start reading at each 10 MB interval). Is there another compression format with better support for random reads than gzip?</p> <p><strong>Edit</strong>: As I mentioned, I wish to do binary search in the compressed data. I don't need to seek to a specific (uncompressed) position--only to seek with some coarse granularity within the compressed file. I just want support for something like "Decompress the data starting roughly 50% (25%, 12.5%, etc.) of the way into this compressed file."</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload