Note that there are some explanatory texts on larger screens.

plurals
  1. POPassing iterators to any for execution for speed and Why?
    primarykey
    data
    text
    <p>Questions are summarized here. Yes, I know <strong>some</strong> of these answers ;) and I can do some hand-waving on others, but I'd really like to get to the nitty-gritty here.</p> <ol> <li>Is this even a good idea at all? (This one is <strong>not</strong> below)</li> <li>I wonder if the map actually adds speed improvements? Why?</li> <li>Why in the world would passing iterators to <a href="http://docs.python.org/library/functions.html#any" rel="nofollow noreferrer">any</a> make my code faster? </li> <li>Why did my Counter object work and my print_true function fail miserably? </li> <li>Is there an equivalent to <a href="http://docs.python.org/library/itertools.html#itertools.imap" rel="nofollow noreferrer">itertools.imap</a> that will just call a function over and over again, and optionally a certain amount of times?</li> <li>Where is my carrot?!?</li> </ol> <hr> <p>I just watched <a href="http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2011-how-dropbox-did-it-and-how-python-helped-4896698#disqus_thread" rel="nofollow noreferrer">PyCon 2011: How Dropbox Did It and How Python Helped</a> (admittedly I skipped through most of the parts), but FINALLY the really interesting stuff started at around 22:23.</p> <p>The speaker advocated making your inner loops in C and that "run once" stuff doesn't need much optimization (make sense)... then he goes on to state... paraphrased:</p> <blockquote> <p>Pass a composition of iterators to <a href="http://docs.python.org/library/functions.html#any" rel="nofollow noreferrer">any</a> for massive speed improvements.</p> </blockquote> <p>Here is the code (hopefully it's identical):</p> <pre class="lang-python prettyprint-override"><code>import itertools, hashlib, time _md5 = hashlib.md5() def run(): for i in itertools.repeat("foo", 10000000): _md5.update(i) a = time.time(); run(); time.time() - a Out[118]: 9.44077205657959 _md5 = hashlib.md5() def run(): any(itertools.imap(_md5.update, itertools.repeat("foo", 10000000))) a = time.time(); run(); time.time() - a Out[121]: 6.547091007232666 </code></pre> <p>Hmm, looks like for even greater speed improvements I can just get a faster computer! (Judging from his slide.)</p> <p>He then does a bunch of hand-waving without actually going into details as to <strong>why</strong>.</p> <p>I already knew about the iterators from the answer to <a href="https://stackoverflow.com/questions/2970780/pythonic-way-to-do-something-n-times">pythonic way to do something N times without an index variable?</a> thanks to Alex Martelli.</p> <p>Then I thought, I wonder if the map actually adds speed improvements? My final thought was WTF??? passing to <a href="http://docs.python.org/library/functions.html#any" rel="nofollow noreferrer">any</a>? REALLY??? Surely that can't be right since the documentation defines <a href="http://docs.python.org/library/functions.html#any" rel="nofollow noreferrer">any</a> as:</p> <pre class="lang-python prettyprint-override"><code>def any(iterable): for element in iterable: if element: return True return False </code></pre> <p>Why in the world would passing iterators to any make my code faster?</p> <p>I then tested it using the following (among many other tests) but this is what gets me:</p> <pre class="lang-python prettyprint-override"><code>def print_true(x): print 'True' return 'Awesome' def test_for_loop_over_iter_map_over_iter_repeat(): for result in itertools.imap(print_true, itertools.repeat("foo", 5)): pass def run_any_over_iter_map_over_iter_repeat(): any(itertools.imap(print_true, itertools.repeat("foo", 5))) And the runs: In [67]: test_for_loop_over_iter_map_over_iter_repeat() True True True True True In [74]: run_any_over_iter_map_over_iter_repeat() True </code></pre> <p>For shame. I declared this GUY IS FULL OF IT. <strong>Heretic</strong>! But, I calmed down and continued to test. <strong>If this were true how in the hell could Dropbox even work!?!?</strong></p> <p>And with further testing it did work... I initially just used a simple counter object, and it counted all the way up to 10000000 in both cases.</p> <p>So the question is why did my Counter object work and my print_true function fail miserably?</p> <pre class="lang-python prettyprint-override"><code>class Counter(object): count = 0 def count_one(self, none): self.count += 1 def run_any_counter(): counter = Counter() any(itertools.imap(counter.count_one, itertools.repeat("foo", 10000000))) print counter.count def run_for_counter(): counter = Counter() for result in itertools.imap(counter.count_one, itertools.repeat("foo", 10000000)): pass print counter.count </code></pre> <p>output:</p> <pre class="lang-python prettyprint-override"><code>%time run_for_counter() 10000000 CPU times: user 5.54 s, sys: 0.03 s, total: 5.57 s Wall time: 5.68 s %time run_any_counter() 10000000 CPU times: user 5.28 s, sys: 0.02 s, total: 5.30 s Wall time: 5.40 s </code></pre> <p>An even bigger WTF is even after removing the unneeded argument and writing the most sensible code for my Counter object, it's STILL slower than the any-map version. Where is my carrot?!?:</p> <pre class="lang-python prettyprint-override"><code>class CounterNoArg(object): count = 0 def count_one(self): self.count += 1 def straight_count(): counter = CounterNoArg() for _ in itertools.repeat(None, 10000000): counter.count_one() print counter.count </code></pre> <p>Ouput:</p> <pre class="lang-python prettyprint-override"><code>In [111]: %time straight_count() 10000000 CPU times: user 5.44 s, sys: 0.02 s, total: 5.46 s Wall time: 5.60 s </code></pre> <p>I'm asking because I think the Pythonistas or Pythoneers need a carrot so we don't start passing stuff to <a href="http://docs.python.org/library/functions.html#any" rel="nofollow noreferrer">any</a> or <a href="http://docs.python.org/library/functions.html#all" rel="nofollow noreferrer">all</a> for a performance increase, or does one already exist? Possibly an equivalent to <a href="http://docs.python.org/library/itertools.html#itertools.imap" rel="nofollow noreferrer">itertools.imap</a> that will just call a function over and over again, and optionally a certain amount of times.</p> <p>The best I've managed are (using list comprehension gives interesting results):</p> <pre class="lang-python prettyprint-override"><code>def super_run(): counter = CounterNoArg() for _ in (call() for call in itertools.repeat(counter.count_one, 10000000)): pass print counter.count def super_counter_run(): counter = CounterNoArg() [call() for call in itertools.repeat(counter.count_one, 10000000)] print counter.count def run_any_counter(): counter = Counter() any(itertools.imap(counter.count_one, itertools.repeat("foo", 10000000))) print counter.count %time super_run() 10000000 CPU times: user 5.23 s, sys: 0.03 s, total: 5.26 s Wall time: 5.43 s %time super_counter_run() 10000000 CPU times: user 4.75 s, sys: 0.18 s, total: 4.94 s Wall time: 5.80 s %time run_any_counter() 10000000 CPU times: user 5.15 s, sys: 0.06 s, total: 5.21 s Wall time: 5.30 s def run_any_like_presentation(): any(itertools.imap(_md5.update, itertools.repeat("foo", 10000000))) def super_run_like_presentation(): [do_work for do_work in itertools.imap(_md5.update, itertools.repeat("foo", 10000000))] def super_run_like_presentation_2(): [_md5.update(foo) for foo in itertools.repeat("foo", 10000000)] %time run_any_like_presentation() CPU times: user 5.28 s, sys: 0.02 s, total: 5.29 s Wall time: 5.47 s %time super_run_like_presentation() CPU times: user 6.14 s, sys: 0.18 s, total: 6.33 s Wall time: 7.56 s %time super_run_like_presentation_2() CPU times: user 8.44 s, sys: 0.22 s, total: 8.66 s Wall time: 9.59 s </code></pre> <p>Ugh...</p> <p>Note: I encourage you to run the tests yourself.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload