Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><strong>The poor performance you observe is caused by a bug in the Python garbage collector in the version you are using.</strong> Upgrade to Python 2.7, or 3.1 or above to regain the amoritized 0(1) behavior expected of list appending in Python.</p> <p><em>If you cannot upgrade, disable garbage collection as you build the list and turn it on after you finish.</em> </p> <p>(You can also tweak the garbage collector's triggers or selectively call collect as you progress, but I do not explore these options in this answer because they are more complex and I suspect your use case is amenable to the above solution.)</p> <p>Background:</p> <p>See: <a href="https://bugs.python.org/issue4074" rel="nofollow noreferrer">https://bugs.python.org/issue4074</a> and also <a href="https://docs.python.org/release/2.5.2/lib/module-gc.html" rel="nofollow noreferrer">https://docs.python.org/release/2.5.2/lib/module-gc.html</a></p> <p>The reporter observes that appending complex objects (objects that aren't numbers or strings) to a list slows linearly as the list grows in length.</p> <p>The reason for this behavior is that the garbage collector is checking and rechecking every object in the list to see if they are eligible for garbage collection. This behavior causes the linear increase in time to add objects to a list. A fix is expected to land in py3k, so it should not apply to the interpreter you are using.</p> <p>Test:</p> <p>I ran a test to demonstrate this. For 1k iterations I append 10k objects to a list, and record the runtime for each iteration. The overall runtime difference is immediately obvious. With garbage collection disabled during the inner loop of the test, runtime on my system is 18.6s. With garbage collection enabled for the entire test, runtime is 899.4s.</p> <p>This is the test:</p> <pre><code>import time import gc class A: def __init__(self): self.x = 1 self.y = 2 self.why = 'no reason' def time_to_append(size, append_list, item_gen): t0 = time.time() for i in xrange(0, size): append_list.append(item_gen()) return time.time() - t0 def test(): x = [] count = 10000 for i in xrange(0,1000): print len(x), time_to_append(count, x, lambda: A()) def test_nogc(): x = [] count = 10000 for i in xrange(0,1000): gc.disable() print len(x), time_to_append(count, x, lambda: A()) gc.enable() </code></pre> <p>Full source: <a href="https://hypervolu.me/~erik/programming/python_lists/listtest.py.txt" rel="nofollow noreferrer">https://hypervolu.me/~erik/programming/python_lists/listtest.py.txt</a></p> <p>Graphical result: Red is with gc on, blue is with gc off. y-axis is seconds scaled logarithmically.</p> <p><a href="http://hypervolu.me/~erik/programming/python_lists/gc.png" rel="nofollow noreferrer">http://hypervolu.me/~erik/programming/python_lists/gc.png</a></p> <p>As the two plots differ by several orders of magnitude in the y component, here they are independently with the y-axis scaled linearly.</p> <p><a href="http://hypervolu.me/~erik/programming/python_lists/gc_on.png" rel="nofollow noreferrer">http://hypervolu.me/~erik/programming/python_lists/gc_on.png</a></p> <p><a href="http://hypervolu.me/~erik/programming/python_lists/gc_off.png" rel="nofollow noreferrer">http://hypervolu.me/~erik/programming/python_lists/gc_off.png</a></p> <p>Interestingly, with garbage collection off, we see only small spikes in runtime per 10k appends, which suggests that Python's list reallocation costs are relatively low. In any case, they are many orders of magnitude lower than the garbage collection costs.</p> <p>The density of the above plots make it difficult to see that with the garbage collector on, most intervals actually have good performance; it's only when the garbage collector cycles that we encounter the pathological behavior. You can observe this in this histogram of 10k append time. Most of the datapoints fall around 0.02s per 10k appends.</p> <p><a href="http://hypervolu.me/~erik/programming/python_lists/gc_on.hist.png" rel="nofollow noreferrer">http://hypervolu.me/~erik/programming/python_lists/gc_on.hist.png</a></p> <p>The raw data used to produce these plots can be found at <a href="http://hypervolu.me/~erik/programming/python_lists/" rel="nofollow noreferrer">http://hypervolu.me/~erik/programming/python_lists/</a></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload