Note that there are some explanatory texts on larger screens.

plurals
  1. POStrange "BadZipfile: Bad CRC-32" problem
    primarykey
    data
    text
    <p>This code is simplification of code in a Django app that receives an uploaded zip file via HTTP multi-part POST and does read-only processing of the data inside:</p> <pre><code>#!/usr/bin/env python import csv, sys, StringIO, traceback, zipfile try: import io except ImportError: sys.stderr.write('Could not import the `io` module.\n') def get_zip_file(filename, method): if method == 'direct': return zipfile.ZipFile(filename) elif method == 'StringIO': data = file(filename).read() return zipfile.ZipFile(StringIO.StringIO(data)) elif method == 'BytesIO': data = file(filename).read() return zipfile.ZipFile(io.BytesIO(data)) def process_zip_file(filename, method, open_defaults_file): zip_file = get_zip_file(filename, method) items_file = zip_file.open('items.csv') csv_file = csv.DictReader(items_file) try: for idx, row in enumerate(csv_file): image_filename = row['image1'] if open_defaults_file: z = zip_file.open('defaults.csv') z.close() sys.stdout.write('Processed %d items.\n' % idx) except zipfile.BadZipfile: sys.stderr.write('Processing failed on item %d\n\n%s' % (idx, traceback.format_exc())) process_zip_file(sys.argv[1], sys.argv[2], int(sys.argv[3])) </code></pre> <p>Pretty simple. We open the zip file and one or two CSV files inside the zip file.</p> <p>What's weird is that if I run this with a large zip file (~13 MB) and have it instantiate the <code>ZipFile</code> from a <code>StringIO.StringIO</code> or a <code>io.BytesIO</code> (Perhaps anything other than a plain filename? I had similar problems in the Django app when trying to create a <code>ZipFile</code> from a <code>TemporaryUploadedFile</code> or even a file object created by calling <code>os.tmpfile()</code> and <code>shutil.copyfileobj()</code>) and have it open TWO csv files rather than just one, then it fails towards the end of processing. Here's the output that I see on a Linux system:</p> <pre><code>$ ./test_zip_file.py ~/data.zip direct 1 Processed 250 items. $ ./test_zip_file.py ~/data.zip StringIO 1 Processing failed on item 242 Traceback (most recent call last): File "./test_zip_file.py", line 26, in process_zip_file for idx, row in enumerate(csv_file): File ".../python2.7/csv.py", line 104, in next row = self.reader.next() File ".../python2.7/zipfile.py", line 523, in readline return io.BufferedIOBase.readline(self, limit) File ".../python2.7/zipfile.py", line 561, in peek chunk = self.read(n) File ".../python2.7/zipfile.py", line 581, in read data = self.read1(n - len(buf)) File ".../python2.7/zipfile.py", line 641, in read1 self._update_crc(data, eof=eof) File ".../python2.7/zipfile.py", line 596, in _update_crc raise BadZipfile("Bad CRC-32 for file %r" % self.name) BadZipfile: Bad CRC-32 for file 'items.csv' $ ./test_zip_file.py ~/data.zip BytesIO 1 Processing failed on item 242 Traceback (most recent call last): File "./test_zip_file.py", line 26, in process_zip_file for idx, row in enumerate(csv_file): File ".../python2.7/csv.py", line 104, in next row = self.reader.next() File ".../python2.7/zipfile.py", line 523, in readline return io.BufferedIOBase.readline(self, limit) File ".../python2.7/zipfile.py", line 561, in peek chunk = self.read(n) File ".../python2.7/zipfile.py", line 581, in read data = self.read1(n - len(buf)) File ".../python2.7/zipfile.py", line 641, in read1 self._update_crc(data, eof=eof) File ".../python2.7/zipfile.py", line 596, in _update_crc raise BadZipfile("Bad CRC-32 for file %r" % self.name) BadZipfile: Bad CRC-32 for file 'items.csv' $ ./test_zip_file.py ~/data.zip StringIO 0 Processed 250 items. $ ./test_zip_file.py ~/data.zip BytesIO 0 Processed 250 items. </code></pre> <p>Incidentally, the code fails under the same conditions but in a different way on my OS X system. Instead of the <code>BadZipfile</code> exception, it seems to read corrupted data and gets very confused.</p> <p>This all suggests to me that I am doing something in this code that you are not supposed to do -- e.g.: call <code>zipfile.open</code> on a file while already having another file within the same zip file object open? This doesn't seem to be a problem when using <code>ZipFile(filename)</code>, but perhaps it's problematic when passing <code>ZipFile</code> a file-like object, because of some implementation details in the <code>zipfile</code> module?</p> <p>Perhaps I missed something in the <code>zipfile</code> docs? Or maybe it's not documented yet? Or (least likely), a bug in the <code>zipfile</code> module?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload