Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Python's compiler is deliberately dirt-simple -- this makes it fast and highly predictable. Apart from some constant folding, it basically generates bytecode that faithfully mimics your sources. Somebody else already suggested <a href="http://docs.python.org/library/dis.html?highlight=dis#module-dis" rel="noreferrer">dis</a>, and it's indeed a good way to look at the bytecode you're getting -- for example, how <code>for i in [1, 2, 3]:</code> isn't actually doing constant folding but generating the literal list on the fly, while <code>for i in (1, 2, 3):</code> (looping on a literal tuple instead of a literal list) <strong>is</strong> able to constant-fold (reason: a list is a mutable object, and to keep to the "dirt-simple" mission statement the compiler doesn't bother to check that this specific list is never modified so it <em>could</em> be optimized into a tuple).</p> <p>So there's space for ample manual micro-optimization -- hoisting, in particular. I.e., rewrite</p> <pre><code>for x in whatever(): anobj.amethod(x) </code></pre> <p>as</p> <pre><code>f = anobj.amethod for x in whatever(): f(x) </code></pre> <p>to save the repeated lookups (the compiler doesn't check whether a run of <code>anobj.amethod</code> can actually change <code>anobj</code>'s bindings &amp;c so that a fresh lookup is needed next time -- it just does the dirt-simple thing, i.e., no hoisting, which guarantees correctness but definitely doesn't guarantee blazing speed;-).</p> <p>The <a href="http://docs.python.org/library/timeit.html?highlight=timeit#module-timeit" rel="noreferrer">timeit</a> module (best used at a shell prompt IMHO) makes it very simple to measure the overall effects of compilation + bytecode interpretation (just ensure the snippet you're measuring has no side effects that would affect the timing, since <code>timeit</code> <strong>does</strong> run it over and over in a loop;-). For example:</p> <pre><code>$ python -mtimeit 'for x in (1, 2, 3): pass' 1000000 loops, best of 3: 0.219 usec per loop $ python -mtimeit 'for x in [1, 2, 3]: pass' 1000000 loops, best of 3: 0.512 usec per loop </code></pre> <p>you can see the costs of the repeated list construction -- and confirm that is indeed what we're observing by trying a minor tweak:</p> <pre><code>$ python -mtimeit -s'Xs=[1,2,3]' 'for x in Xs: pass' 1000000 loops, best of 3: 0.236 usec per loop $ python -mtimeit -s'Xs=(1,2,3)' 'for x in Xs: pass' 1000000 loops, best of 3: 0.213 usec per loop </code></pre> <p>moving the iterable's construction to the <code>-s</code> setup (which is run only once and not timed) shows that the looping proper is slightly faster on tuples (maybe 10%), but the big issue with the first pair (list slower than tuple by over 100%) is mostly with the construction.</p> <p>Armed with <code>timeit</code> and the knowledge that the compiler's deliberately very simple minded in its optimizations, we can easily answer other questions of yours:</p> <blockquote> <p>How fast are the following operations (comparatively)</p> <pre><code>* Function calls * Class instantiation * Arithmetic * 'Heavier' math operations such as sqrt() </code></pre> </blockquote> <pre><code>$ python -mtimeit -s'def f(): pass' 'f()' 10000000 loops, best of 3: 0.192 usec per loop $ python -mtimeit -s'class o: pass' 'o()' 1000000 loops, best of 3: 0.315 usec per loop $ python -mtimeit -s'class n(object): pass' 'n()' 10000000 loops, best of 3: 0.18 usec per loop </code></pre> <p>so we see: instantiating a new-style class and calling a function (both empty) are about the same speed, with instantiations possibly having a tiny speed margin, maybe 5%; instantiating an old-style class is slowest (by about 50%). Tiny differences such as 5% or less of course could be noise, so repeating each try a few times is advisable; but differences like 50% are definitely well beyond noise.</p> <pre><code>$ python -mtimeit -s'from math import sqrt' 'sqrt(1.2)' 1000000 loops, best of 3: 0.22 usec per loop $ python -mtimeit '1.2**0.5' 10000000 loops, best of 3: 0.0363 usec per loop $ python -mtimeit '1.2*0.5' 10000000 loops, best of 3: 0.0407 usec per loop </code></pre> <p>and here we see: calling <code>sqrt</code> is slower than doing the same computation by operator (using the <code>**</code> raise-to-power operator) by roughly the cost of calling an empty function; all arithmetic operators are roughly the same speed to within noise (the tiny difference of 3 or 4 nanoseconds is definitely noise;-). Checking whether constant folding might interfere:</p> <pre><code>$ python -mtimeit '1.2*0.5' 10000000 loops, best of 3: 0.0407 usec per loop $ python -mtimeit -s'a=1.2; b=0.5' 'a*b' 10000000 loops, best of 3: 0.0965 usec per loop $ python -mtimeit -s'a=1.2; b=0.5' 'a*0.5' 10000000 loops, best of 3: 0.0957 usec per loop $ python -mtimeit -s'a=1.2; b=0.5' '1.2*b' 10000000 loops, best of 3: 0.0932 usec per loop </code></pre> <p>...we see that this is indeed the case: if either or both numbers are being looked up as variables (which blocks constant folding), we're paying the "realistic" cost. Variable lookup has its own cost:</p> <pre><code>$ python -mtimeit -s'a=1.2; b=0.5' 'a' 10000000 loops, best of 3: 0.039 usec per loop </code></pre> <p>and that's far from negligible when we're trying to measure such tiny times anyway. Indeed <em>constant</em> lookup isn't free either:</p> <pre><code>$ python -mtimeit -s'a=1.2; b=0.5' '1.2' 10000000 loops, best of 3: 0.0225 usec per loop </code></pre> <p>as you see, while smaller than variable lookup it's quite comparable -- about half.</p> <p>If and when (armed with careful profiling and measurement) you decide some nucleus of your computations desperately need optimization, I recommend trying <a href="http://www.cython.org/" rel="noreferrer">cython</a> -- it's a C / Python merge which tries to be as neat as Python and as fast as C, and while it can't get there 100% it surely makes a good fist of it (in particular, it makes binary code that's quite a bit faster than you can get with its predecessor language, <a href="http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/" rel="noreferrer">pyrex</a>, as well as being a bit richer than it). For the last few %'s of performance you probably still want to go down to C (or assembly / machine code in some exceptional cases), but that would be really, really rare.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload