Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>To answer the first question about optimizing by passing to any. No, I believe it is not a good idea for the main reason that it is not it's intended purpose. Sure it's easy to implement, but maintenance could become a nightmare. By doing this a new gotcha is introduced into your code base. If the function ever returns false, then the iterator will not be fully consumed causing strange behavior, and bugs that are hard to track down. Also, there exist faster (or at least nearly as fast) alternatives to using the built-in any.</p> <p>Of course, you can make an exception because it seems any can actually out perform deque, but using any is certainly extreme and most often unnecessary. In fact, if anything, you may be introducing optimizations they may no longer be "optimal" after updates to the Python code base (see 2.7 vs 3.2).</p> <p>Another thing to mention is this use of any doesn't immediately make any sense. Whether or not to implement a C extension before using any like this is also debatable. Personally, I'd prefer it for semantic reasons.</p> <p>As far as optimizing your own code let's start with what we're up against: refer to run_any_like_presentation. It's pretty fast :)</p> <p>An initial implementation could look something like:</p> <pre><code>import itertools, hashlib _md5 = hashlib.md5() def run(): for _ in xrange(100000000): _md5.update("foo") </code></pre> <p>The first step is using itertools.repeat to do something N times.</p> <pre><code>def run_just_repeat(): for foo in itertools.repeat("foo", 100000000): _md5.update(foo) </code></pre> <p>The second second optimization is to use itertools.imap to get a speed increase from not having to pass the foo reference in Python code. It is now in C.</p> <pre><code>def run_imap_and_repeat(): for do_work in itertools.imap(_md5.update, itertools.repeat("foo", 10000000)): pass </code></pre> <p>The third optimization is to move the for loop entirely into C code.</p> <pre><code>import collections def run_deque_imap_and_repeat(): collections.deque(itertools.imap(_md5.update, itertools.repeat("foo", 10000000))) </code></pre> <p>The final optimization is to move all potential looks ups into the namespace of the run function:</p> <p>This idea is taken from the very end of <a href="http://docs.python.org/library/itertools.html?highlight=itertools" rel="nofollow">http://docs.python.org/library/itertools.html?highlight=itertools</a></p> <blockquote> <p>Note, many of the above recipes can be optimized by replacing global lookups with local variables defined as default values.</p> </blockquote> <p>Personally, I had mixed success with this showing improvements. ie. Small improvements under certain conditions, from module import xxx also showing performance increases without passing it in as well. Also, sometimes if I pass in some variables, and not others I see slight differences as well. The point is, I feel this one your going to need to test yourself to see if it works for you. </p> <pre><code>def run_deque_imap_and_repeat_all_local(deque = collections.deque, imap = itertools.imap, _md5 = _md5, repeat = itertools.repeat, md5 = hashlib.md5): update = _md5.update deque(imap(_md5.update, repeat("foo", 100000000)), maxlen = 0) </code></pre> <p>And finally to be fair let's implement an any version like the presentation that does the final optimization as well.</p> <pre><code>def run_any_like_presentation_all_local(any = any, deque = collections.deque, imap = itertools.imap, _md5 = _md5, repeat = itertools.repeat, md5 = hashlib.md5): any(imap(_md5.update, repeat("foo", 100000000))) </code></pre> <p>Ok now let's run some tests (Python 2.7.2 OS X Snow Leopard 64-bit):</p> <ul> <li><p>run_reference - 123.913 seconds</p></li> <li><p>run_deque_imap_and_repeat_all_local - 51.201 seconds</p></li> <li><p>run_deque_local_imap_and_repeat - 53.013 seconds</p></li> <li><p>run_deque_imap_and_repeat - 48.913 seconds</p></li> <li><p>run_any_like_presentation - 49.833 seconds</p></li> <li><p>run_any_like_presentation_all_local - 47.780 seconds</p></li> </ul> <p>And just for kicks in Python3 (Python 3.2 OS X Snow Leopard 64-bit):</p> <ul> <li><p>run_reference - 94.273 seconds (100000004 function calls!)</p></li> <li><p>run_deque_imap_and_repeat_all_local - 23.929 seconds</p></li> <li><p>run_deque_local_imap_and_repeat - 23.298 seconds</p></li> <li><p>run_deque_imap_and_repeat - 24.201 seconds</p></li> <li><p>run_any_like_presentation - 24.026 seconds</p></li> <li><p>run_any_like_presentation_all_local - 25.316 seconds</p></li> </ul> <p>Here's my source for the tests:</p> <pre><code>import itertools, hashlib, collections _md5 = hashlib.md5() def run_reference(): for _ in xrange(100000000): _md5.update("foo") def run_deque_imap_and_repeat_all_local(deque = collections.deque, imap = itertools.imap, _md5 = _md5, repeat = itertools.repeat, md5 = hashlib.md5): deque(imap(_md5.update, repeat("foo", 100000000)), maxlen = 0) def run_deque_local_imap_and_repeat(deque = collections.deque, imap = itertools.imap, _md5 = _md5, repeat = itertools.repeat, md5 = hashlib.md5): deque(imap(_md5.update, repeat("foo", 100000000)), maxlen = 0) def run_deque_imap_and_repeat(): collections.deque(itertools.imap(_md5.update, itertools.repeat("foo", 100000000)), maxlen = 0) def run_any_like_presentation(): any(itertools.imap(_md5.update, itertools.repeat("foo", 100000000))) def run_any_like_presentation_all_local(any = any, deque = collections.deque, imap = itertools.imap, _md5 = _md5, repeat = itertools.repeat, md5 = hashlib.md5): any(imap(_md5.update, repeat("foo", 100000000))) import cProfile import pstats def performance_test(a_func): cProfile.run(a_func, 'stats') p = pstats.Stats('stats') p.sort_stats('time').print_stats(10) performance_test('run_reference()') performance_test('run_deque_imap_and_repeat_all_local()') performance_test('run_deque_local_imap_and_repeat()') performance_test('run_deque_imap_and_repeat()') performance_test('run_any_like_presentation()') performance_test('run_any_like_presentation_all_local()') </code></pre> <p>And Python3</p> <pre><code>import itertools, hashlib, collections _md5 = hashlib.md5() def run_reference(foo = "foo".encode('utf-8')): for _ in range(100000000): _md5.update(foo) def run_deque_imap_and_repeat_all_local(deque = collections.deque, imap = map, _md5 = _md5, repeat = itertools.repeat, md5 = hashlib.md5): deque(imap(_md5.update, repeat("foo".encode('utf-8'), 100000000)), maxlen = 0) def run_deque_local_imap_and_repeat(deque = collections.deque, imap = map, _md5 = _md5, repeat = itertools.repeat, md5 = hashlib.md5): deque(imap(_md5.update, repeat("foo".encode('utf-8'), 100000000)), maxlen = 0) def run_deque_imap_and_repeat(): collections.deque(map(_md5.update, itertools.repeat("foo".encode('utf-8'), 100000000)), maxlen = 0) def run_any_like_presentation(): any(map(_md5.update, itertools.repeat("foo".encode('utf-8'), 100000000))) def run_any_like_presentation_all_local(any = any, deque = collections.deque, imap = map, _md5 = _md5, repeat = itertools.repeat): any(imap(_md5.update, repeat("foo".encode('utf-8'), 100000000))) import cProfile import pstats def performance_test(a_func): cProfile.run(a_func, 'stats') p = pstats.Stats('stats') p.sort_stats('time').print_stats(10) performance_test('run_reference()') performance_test('run_deque_imap_and_repeat_all_local()') performance_test('run_deque_local_imap_and_repeat()') performance_test('run_deque_imap_and_repeat()') performance_test('run_any_like_presentation()') performance_test('run_any_like_presentation_all_local()') </code></pre> <p>Another thing, don't do this on a real project unless there is a certifiable performance bottleneck.</p> <p>And, finally, if we really need a carrot (aside from writing code that makes sense, and isn't prone to error) in those hard times where any actually out performs deque, your more sensible code will be in a better position to take advantage of improvements in newer versions of Python without having to modify your code base.</p> <p><a href="http://www.python.org/doc/essays/list2str/" rel="nofollow">http://www.python.org/doc/essays/list2str/</a> is a good read on how to approach Python optimization. (ie. ideally writing a C extension is NOT the first thing you reach for).</p> <p>I'd also like to point out Gareth's answer as he may be on to why any can out perform deque.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload