Note that there are some explanatory texts on larger screens.

plurals
  1. POKeeping large dictionary in Python affects application performance
    text
    copied!<p>I'm having some difficulties understanding (and ultimately solving) why having a large dictionary in memory makes creation of other dictionaries longer.</p> <p>Here's the test code that I'm using</p> <pre><code>import time def create_dict(): # return {x:[x]*125 for x in xrange(0, 100000)} return {x:(x)*125 for x in xrange(0, 100000)} # UPDATED: to use tuples instead of list of values class Foo(object): @staticmethod def dict_init(): start = time.clock() Foo.sample_dict = create_dict() print "dict_init in Foo took {0} sec".format(time.clock() - start) if __name__ == '__main__': Foo.dict_init() for x in xrange(0, 10): start = time.clock() create_dict() print "Run {0} took {1} seconds".format(x, time.clock() - start) </code></pre> <p>If I run the code as is (first initializing sample_dict in Foo) and then creating the same dictionary 10 more times in the loop I get the following results:</p> <pre><code>dict_init in Foo took 0.385263764287 sec Run 0 took 0.548807949139 seconds Run 1 took 0.533209452471 seconds Run 2 took 0.51916067636 seconds Run 3 took 0.513130722575 seconds Run 4 took 0.508272050029 seconds Run 5 took 0.502263872177 seconds Run 6 took 0.48867601998 seconds Run 7 took 0.483109299676 seconds Run 8 took 0.479019713488 seconds Run 9 took 0.473174195256 seconds [Finished in 5.6s] </code></pre> <p>If, however, I do NOT initialize sample_dict in Foo (commenting out Foo.dict_init()) I'm getting almost 20% faster dictionary creation in the loop</p> <pre><code>Run 0 took 0.431378921359 seconds Run 1 took 0.423696636179 seconds Run 2 took 0.419630475616 seconds Run 3 took 0.405130343806 seconds Run 4 took 0.398099686921 seconds Run 5 took 0.392837169802 seconds Run 6 took 0.38799598399 seconds Run 7 took 0.375133006408 seconds Run 8 took 0.368755297573 seconds Run 9 took 0.363273701371 seconds [Finished in 4.0s] </code></pre> <p>I've noticed that if I turn OFF Python's garbage collector by calling gc.disable() performance not only improves ~5x but storing large dictionary in Foo doesn't make a difference. Here are results with garbage collection disabled:</p> <pre><code>dict_init in Foo took 0.0696136982496 sec Run 0 took 0.113533445358 seconds Run 1 took 0.111091241489 seconds Run 2 took 0.111151620212 seconds Run 3 took 0.110655722831 seconds Run 4 took 0.111807537706 seconds Run 5 took 0.11097510318 seconds Run 6 took 0.110936170451 seconds Run 7 took 0.111074414632 seconds Run 8 took 0.110678488579 seconds Run 9 took 0.111011066463 seconds </code></pre> <p>So I have 2 questions:</p> <ul> <li>Why does disabling garbage collection speeds up dictionary creation</li> <li>How to achieve even performance (with Foo init and w/o) without disable garbage collection</li> </ul> <p>I would appreciate any insight on this. </p> <p>Thank you!</p> <p>UPDATED: After Tim Peters mentioned that I'm creating mutable objects, I've decided to try to create immutable objects (tuples in my case) and voila - much, much faster results (same with using gc and without)</p> <pre><code>dict_init in Foo took 0.017769 sec Run 0 took 0.017547 seconds Run 1 took 0.013234 seconds Run 2 took 0.012791 seconds Run 3 took 0.013371 seconds Run 4 took 0.013288 seconds Run 5 took 0.013692 seconds Run 6 took 0.013059 seconds Run 7 took 0.013311 seconds Run 8 took 0.013343 seconds Run 9 took 0.013675 seconds </code></pre> <p>I understand that tuple creation is much faster than list creation but why does having a dictionary of immutable objects doesn't affect time spent by garbage collection? Are immutable objects not involved in reference cycle?</p> <p>Thank you.</p> <p>P.S. As it happens, in my real-world scenario, converting list to tuples solved the problem (there was never a need to have lists, just didn't think of using tuples) but I'm still curious as to why it's faster.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload