Note that there are some explanatory texts on larger screens.

plurals
  1. POC BZ2_bzDecompress way slower than bzip2 command
    primarykey
    data
    text
    <p>I'm using mmap/read + BZ2_bzDecompress to sequentially decompress a large file (29GB). This is done because I need to parse the uncompressed xml data, but only need small bits of it, and it seemed like it would be way more efficient to do this sequentially than to uncompress the whole file (400GB uncompressed) and then parse it. Interestingly already the decompression part is extremely slow - while the shell command bzip2 is able to do a bit more than 52MB per second (used several runs of <code>timeout 10 bzip2 -c -k -d input.bz2 &gt; output</code> and divided produced filesize by 10), my program is able to do not even 2MB/s, slowing down after a few seconds to 1.2MB/s</p> <p>The file I'm trying to process uses multiple bz2 streams, so I'm checking <code>BZ2_bzDecompress</code> for <code>BZ_STREAM_END</code>, and if it occurs, use <code>BZ2_bzDecompressEnd( strm );</code> and <code>BZ2_bzDecompressInit( strm, 0, 0 )</code> to restart with the next stream, in case the file hasn't been completely processed. I also tried without <code>BZ2_bzDecompressEnd</code> but that didn't change anything (and I can't really see in the documentation how one should handle multiple streams correctly)</p> <p>The file is being mmap'ed before, where I also tried different combinations of flags, currently <code>MAP_RDONLY</code>, <code>MAP_PRIVATE</code> with madvise to <code>MADV_SEQUENTIAL | MADV_WILLNEED | MADV_HUGEPAGE</code> (I'm checking return value, and madvise does not report any problems, and I'm on a linux kernel 3.2x debian setup which has hugepage support)</p> <p>When profiling I made sure that other than some counters for measuring speed and a printf which was limited to once every n iterations, nothing else was run. Also this is on a modern multicore server processor where all other cores where idle, and it's bare metal, not virtualized.</p> <p>Any ideas on what I could be doing wrong / do to improve performance?</p> <p>Update: Thanks to James Chong's suggestion I tried "swapping" <code>mmap()</code> with <code>read()</code>, and the speed is still the same. So it seems <code>mmap()</code> is not the problem (either that, or <code>mmap()</code> and <code>read()</code> share an underlying problem)</p> <p>Update 2: Thinking that maybe the malloc/free calls done in bzDecompressInit/bzDecompressEnd would be the cause, I set bzalloc/bzfree of the bz_stream struct to a custom implementation which only allocates memory the first time and does not free it unless a flag is set (passed by the opaque parameter = strm.opaque). It works perfectly fine, but again the speed did not increase.</p> <p>Update 3: I also tried fread() instead of read() now, and still the speed stays the same. Also tried different amount of read bytes and decompressed-data-buffer sizes - no change.</p> <p>Update 4: Reading speed is definitely not an issue, as I've been able to achieve speeds close to about 120MB/s in sequential reading using just mmap().</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload