Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>A single character is a valid regular expression. A single character that is not "magic" matches itself. If you can identify a single character that will never, ever appear in your text, you could make a pattern from that.</p> <p>How about ASCII NUL, character 0?</p> <p>I stuck in one more string in your test program, the string: <code>'\0'</code></p> <p>It was about as fast as your best pattern: <code>b(?&lt;!b)</code></p> <p>Okay, you already have a character after the end of the string. How about a character <em>before</em> the start of the string? That's impossible: <code>'x^'</code></p> <p>Aha! That's faster than checking for a character after end of string. But it's about as fast as your best pattern.</p> <p>I suggest replacing the <code>b</code> with an ASCII NUL and calling it good. When I tried that pattern: <code>\0(?&lt;!\0)</code></p> <p>It wins by a tiny fraction. But really, on my computer, all the ones discussed above are so close together that there isn't much to distinguish them.</p> <p>Results:</p> <pre><code>Pattern Time \0(?&lt;!\0) 0.098 \0 0.099 x^ 0.099 b(?&lt;!b) 0.099 ^(?&lt;=x) 1.416 $b 1.446 $a 1.447 \Za 1.462 \Zb 1.465 [^\s\S] 2.280 a(?&lt;!a) 2.843 </code></pre> <p>That was <em>fun</em>. Thanks for posting the question.</p> <p>EDIT: Ah <em>hah</em>! I rewrote the program to test with real input data, and got a different result.</p> <p>I downloaded "The Complete Works of William Shakespeare" from Project Gutenberg as a text file. (Weird, it gave an error on <code>wget</code> but let my browser get it... some sort of measure to protect against automated copying?) URL: <a href="http://www.gutenberg.org/cache/epub/100/pg100.txt" rel="nofollow">http://www.gutenberg.org/cache/epub/100/pg100.txt</a></p> <p>Here are the results, followed by the modified program as I ran it.</p> <pre><code>Pattern Time \0(?&lt;!\0) 0.110 \0 0.118 x^ 0.119 b(?&lt;!b) 0.143 a(?&lt;!a) 0.275 ^(?&lt;=x) 1.577 $b 1.605 $a 1.611 \Za 1.634 \Zb 1.634 [^\s\S] 2.441 </code></pre> <p>So yeah I'm definitely going with that first one.</p> <pre><code>#!/usr/bin/env python import re import time tests = [ r'x^', r'\0', r'[^\s\S]', r'^(?&lt;=x)', r'a(?&lt;!a)', r'b(?&lt;!b)', r'\0(?&lt;!\0)', r'\Za', r'\Zb', r'$a', r'$b' ] timing = [] #text = 'a' * 50000000 text = open("/tmp/pg100.txt").read() text = text * 10 for t in tests: pat = re.compile(t) start = time.time() pat.search(text) dur = time.time() - start timing.append((t, dur)) timing.sort(key=lambda x: x[1]) print('%-30s %s' % ('Pattern', 'Time')) for t, dur in timing: print('%-30s %0.3f' % (t, dur)) </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload