Note that there are some explanatory texts on larger screens.

plurals
  1. POHandle gzipped or bzip2ed downloads without keeping compressed data
    primarykey
    data
    text
    <p>I'd like to download a compressed file (either in gzip or bzip2), decompress it and analyze its contents (it's a CSV-like file with lots of data, I calculate sums, averages and such for certain columns) <em>while</em> the download happens (so that I can show partial results before the download ends). The file is big (4GB), decompressed stream is even bigger, so I don't want to keep the whole compressed file on disk or in memory.</p> <p>I thought it will be possible to combine python's gzip or bz2 implementations with urllib2:</p> <pre><code>data_stream = csv.reader( gzip.GzipFile( fileobj=urllib2.urlopen('http://…/somefile.gz')), delimiter='\t') </code></pre> <p>…but it seems that urlopen's file is not file-like enough for GzipFile. I get a traceback after trying to read from such a stream:</p> <pre><code>Traceback (most recent call last): File "&lt;stdin&gt;", line 1, in &lt;module&gt; File "/usr/lib/python2.7/gzip.py", line 450, in readline c = self.read(readsize) File "/usr/lib/python2.7/gzip.py", line 256, in read self._read(readsize) File "/usr/lib/python2.7/gzip.py", line 283, in _read pos = self.fileobj.tell() # Save current position AttributeError: addinfourl instance has no attribute 'tell' </code></pre> <p>BZ2 module is even worse—it doesn't allow passing a file object at all.</p> <p>After looking for some answers, I found <a href="https://stackoverflow.com/q/4204604/42610">this question</a>. The answer works by basically storing the whole compressed file in memory, which is unfeasible for me.</p> <p>What can I do?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload