Note that there are some explanatory texts on larger screens.

plurals
  1. POUncompressing a ZIP file in memory in Java
    text
    copied!<p>I'm downloading zipped files containing XMLs, and I'd like to avoid writing the zip files to disk before manipulating them because of latency requirements. However, <code>java.util.zip</code> doesn't suffice for me. There's no way to say "here's a byte array of a zip file, use it" without turning it into a stream, and <code>ZipInputStream</code> is not reliable, since it scans for entry headers (see discussion below EDIT for reasons why that is not reliable).</p> <p>I do not yet have access to the zip files I'll be handling, so I don't know whether I'll be able to handle them through the <code>ZipInputStream</code>, and I need to find a solution that will work with any valid ZIP files, as the penalty for a failure once I go into production will be high.</p> <p>Assuming ZipInputStream won't work, what can I do to solve this problem in cases where there are no entry headers? I'm using <a href="http://en.wikipedia.org/wiki/Zip_%28file_format%29#Structure" rel="nofollow noreferrer">Wikipedia's definition</a>, which includes a comment on how to correctly uncompress zip files (quoted below), as the standard.</p> <p><strong>EDIT</strong></p> <p>The Apache Commons Zip library has a <a href="http://commons.apache.org/compress/zip.html" rel="nofollow noreferrer">good write up</a> on some of the problems using Stream (both their solution and Java's) has. I'll further add, from wikipedia and personal experience, and the size and crc field on entry headers may not be filled (I've files with -1 in these fields). Thanks to <a href="https://stackoverflow.com/users/411846/centic">centic</a> for providing this link. </p> <p>Also, let me quote the wikipedia on the subject:</p> <blockquote> <p>Tools that correctly read zip archives must scan for the signatures of the various fields, the zip central directory. They must not scan for entries because only the directory specifies where a file chunk starts. Scanning could lead to false positives, as the format doesn't forbid other data to be between chunks, or uncompressed stream containing such signatures.</p> </blockquote> <p>Note that <code>ZipInputStream</code> scans for entries, not the central directory, which is the problem with it.</p> <p><strong>Final Edit</strong></p> <p>If anyone is interested, <a href="https://gist.github.com/3800777" rel="nofollow noreferrer">this script</a> can be used to produce a valid ZIP file that cannot be read by <code>ZipInputStream</code> from an existing ZIP file. So, as a final edit to this closed question, I needed a library that can read files such as the ones produced by this script.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload