Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I just tried this myself. For the simple case of parsing a number out of a string and summing it, using a compiled regular expression object is about twice as fast as using the <code>re</code> methods.</p> <p>As others have pointed out, the <code>re</code> methods (including <code>re.compile</code>) look up the regular expression string in a cache of previously compiled expressions. Therefore, in the normal case, the extra cost of using the <code>re</code> methods is simply the cost of the cache lookup.</p> <p>However, examination of the <a href="http://www.google.com/codesearch/p?hl=en#1IKf2ZWr9OM/tools/third_party/python_26/Lib/re.py&amp;q=lang:python%20re.py&amp;sa=N&amp;cd=4&amp;ct=rc" rel="noreferrer">code</a>, shows the cache is limited to 100 expressions. This begs the question, how painful is it to overflow the cache? The code contains an internal interface to the regular expression compiler, <code>re.sre_compile.compile</code>. If we call it, we bypass the cache. It turns out to be about two orders of magnitude slower for a basic regular expression, such as <code>r'\w+\s+([0-9_]+)\s+\w*'</code>.</p> <p>Here's my test:</p> <pre><code>#!/usr/bin/env python import re import time def timed(func): def wrapper(*args): t = time.time() result = func(*args) t = time.time() - t print '%s took %.3f seconds.' % (func.func_name, t) return result return wrapper regularExpression = r'\w+\s+([0-9_]+)\s+\w*' testString = "average 2 never" @timed def noncompiled(): a = 0 for x in xrange(1000000): m = re.match(regularExpression, testString) a += int(m.group(1)) return a @timed def compiled(): a = 0 rgx = re.compile(regularExpression) for x in xrange(1000000): m = rgx.match(testString) a += int(m.group(1)) return a @timed def reallyCompiled(): a = 0 rgx = re.sre_compile.compile(regularExpression) for x in xrange(1000000): m = rgx.match(testString) a += int(m.group(1)) return a @timed def compiledInLoop(): a = 0 for x in xrange(1000000): rgx = re.compile(regularExpression) m = rgx.match(testString) a += int(m.group(1)) return a @timed def reallyCompiledInLoop(): a = 0 for x in xrange(10000): rgx = re.sre_compile.compile(regularExpression) m = rgx.match(testString) a += int(m.group(1)) return a r1 = noncompiled() r2 = compiled() r3 = reallyCompiled() r4 = compiledInLoop() r5 = reallyCompiledInLoop() print "r1 = ", r1 print "r2 = ", r2 print "r3 = ", r3 print "r4 = ", r4 print "r5 = ", r5 &lt;/pre&gt; And here is the output on my machine: &lt;pre&gt; $ regexTest.py noncompiled took 4.555 seconds. compiled took 2.323 seconds. reallyCompiled took 2.325 seconds. compiledInLoop took 4.620 seconds. reallyCompiledInLoop took 4.074 seconds. r1 = 2000000 r2 = 2000000 r3 = 2000000 r4 = 2000000 r5 = 20000 </code></pre> <p>The 'reallyCompiled' methods use the internal interface, which bypasses the cache. Note the one that compiles on each loop iteration is only iterated 10,000 times, not one million.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload