Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The entries relevant for possible optimization are those with high values for <em>ncalls</em> and <em>tottime</em>. <code>bgchr:4(&lt;module&gt;)</code> and <code>&lt;string&gt;:1(&lt;module&gt;)</code> probably refer to the execution of your module body and are not relevant here.</p> <p>Obviously, your performance problem comes from string processing. This should perhaps be reduced. The hot spots are <code>split</code>, <code>join</code> and <code>sys.stdout.write</code>. <code>bz2.decompress</code> also seems to be costly.</p> <p>I suggest you try the following:</p> <ul> <li>Your main data seems to consist of tab separated CSV values. Try out, if CSV reader performs better.</li> <li>sys.stdout is line buffered and flushed each time a newline is written. Consider writing to a file with a larger buffer size.</li> <li>Instead of joining elements before writing them out, write them sequentially to the output file. You may also consider using CSV writer.</li> <li>Instead of decompressing the data at once into a single string, use a BZ2File object and pass that to the CSV reader.</li> </ul> <p>It seems that the loop body that actually uncompresses data is only invoked once. Perhaps you find a way to avoid the call <code>dataHandle.read(size)</code>, which produces a huge string that is then decompressed, and to work with the file object directly.</p> <p><strong>Addendum:</strong> BZ2File is probably not applicable in your case, because it requires a filename argument. What you need is something like a file object view with integrated read limit, comparable to ZipExtFile but using BZ2Decompressor for decompression.</p> <p>My main point here is that your code should be changed to perform a more iterative processing of your data instead of slurping it in as a whole and splitting it again afterwards.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload