StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>EDIT: Another suggestion...</p> <p>Looking at <code>ZipFile</code> from the Apache Commons implementation, it looks like it wouldn't be <em>too</em> hard to effectively fork that for your project. Create a wrapper around your byte array which has all the pieces of the <code>RandomAccessFile</code> API which are required (I don't think there are very many). You've already indicated that you prefer the interface to <code>ZipFile</code>, so why not go with that?</p> <p>We don't know enough about your project to know whether this opens up any legal questions - and even if you gave details, I doubt that anyone here would be able to give good legal advice - but I suspect it wouldn't take more than an hour or two to get this solution up and working, and I suspect you'd have reasonable confidence in it.</p> <hr> <p>EDIT: This may be a slightly more productive answer...</p> <p>If you're worried about the entries not being contiguous, but don't want to handle all the compression side yourself, you might consider an option where you effectively <em>rewrite</em> the data. Create a new <code>ByteArrayOutputStream</code>, and read the central directory at the end. For each entry in the central directory, write out an entry (header + data) to the output stream in a format that you believe <code>ZipInputStream</code> will be happy with. Then write a new central directory - if you want your replacement to be valid you may need to do this from scratch, but if you're using code which you <em>know</em> won't actually read the central directory, you could just provide the original one, ignoring the fact that it might not then be valid. So long as it starts with the right signature, that's probably good enough :)</p> <p>Once you've done that, convert the <code>ByteArrayOutputStream</code> into a <em>new</em> <code>byte[]</code>, wrap it in a <code>ByteArrayInputStream</code> and then pass that to <code>ZipInputStream</code> or <code>ZipArchiveInputStream</code>.</p> <p>Depending on your purposes, you may not even need to do that much - you may be able to just extract each file as you go by creating a "mini" zip file with just the one entry you're reading from the directory at a time.</p> <p>This <em>does</em> involve understanding the zip file format, but not completely - just the skeleton, effectively. It's not a quick and easy fix like using an existing API completely, but it shouldn't take <em>very</em> long. It doesn't guarantee it'll be able to read all invalid files (how could it?) but it will protect you against the "data between entries" issue you seem to be particularly concerned about. Hope it's at least a useful idea...</p> <hr> <blockquote> <p>there's no way to say "here's a byte array of a zip file, use it"</p> </blockquote> <p>Yes there is:</p> <pre><code>byte[] data = ...; ByteArrayInputStream byteStream = new ByteArrayInputStream(data); ZipInputStream zipStream = new ZipInputStream(byteStream); </code></pre> <p>That leaves the issue of whether <code>ZipInputStream</code> can handle all the zip files you'll give it - but I wouldn't write it off quite so quickly.</p> <p>Of course, there are other APIs available. You may want to look at <a href="http://commons.apache.org/compress/" rel="noreferrer">Apache Commons Compress</a>, for example. Even though <code>ZipFile</code> requires a file, <a href="http://commons.apache.org/compress/apidocs/org/apache/commons/compress/archivers/zip/ZipArchiveInputStream.html" rel="noreferrer"><code>ZipArchiveInputStream</code></a> doesn't - so again, you could use a <code>ByteArrayInputStream</code>. EDIT: It looks like <code>ZipArchiveStream</code> <em>doesn't</em> read from the central directory either. I was hoping it would use <code>markSupported</code> to check beforehand, but it appears not to...</p> <p>EDIT: In the comments on the question, I asked where you'd read that the zip file doesn't have to contain entry data. You quoted wikipedia:</p> <blockquote> <p>"Tools that correctly read zip archives must scan for the signatures of the various fields, the zip central directory. They must not scan for entries because only the directory specifies where a file chunk starts. Scanning could lead to false positives, as the format doesn't forbid other data to be between chunks, or uncompressed stream containing such signatures."</p> </blockquote> <p>That's not the same as entry data being optional. It's saying that there may be <em>extra</em> data in awkward places, not that the entries may be missing completely. It's basically saying that the entries shouldn't be assumed to be <em>contiguous</em>. I could happily concede that <code>ZipInputStream</code> may not be reading the central directory at the end of the file, but finding code which does that isn't the same as finding code which copes with entry data not existing.</p> <p>You then write:</p> <blockquote> <p>I might further add that whether the zip is valid or not is not my concern. Working with it is.</p> </blockquote> <p>... which suggests you want code which will handle invalid zip files. Combined with this:</p> <blockquote> <p>I do not yet have access to the zip files I'll be handling, so I don't know whether I'll be able to handle them through the stream</p> </blockquote> <p>That means you're asking for code which should handle zip files which are invalid in ways you can't even predict. Just how invalid would it have to be for you to be able to reject it? If I give you 1000 random bytes, with no attempt for them to be a zip file at all, what on earth would you do with it?</p> <p>Basically, you need to pin the problem down more tightly before it's feasible to even say whether a particular library is a valid solution. It's reasonable to collect a set of zip files from various places, which may be invalid in well-understood ways, and say "I must be able to support all of these." Later you <em>may</em> need to do some work if it turns out that wasn't good enough. But to be able to support anything, however broken, simply isn't a valid requirement.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload