StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p><strong>Executive summary</strong>: use 5.010's /p instead. The performance of <code>$&</code> is about the same for a single match or substitution, but the entire program can suffer from it. It's slowdown is long-range, not local.</p> <hr> <p>Here's a benchmark with 5.010, which I suspect you are using since you have <code>say</code> in there. Note that 5.010 has a new <code>/p</code> flag that supplies a <code>${^MATCH}</code> variable that acts like <code>$&</code> but for only one instance of the match or substitution operator.</p> <p>As with any benchmark, I compare with a control to set the baseline so I know how much time the boring bits take up. Also, this benchmark has a trap: you can't use <code>$&</code> in the code or every substitution suffers. First run the benchmark without the <code>$&</code> sub:</p> <pre><code>use 5.010; use Benchmark qw(cmpthese); cmpthese(1_000_000, { 'control' => sub { my $_ = 'abc123def'; s/\d+/246/ }, 'control-e' => sub { my $_ = 'abc123def'; s/\d+/123*2/e; }, '/p' => sub { my $_ = 'abc123def'; s/\d+/${^MATCH}*2/pe }, # '$&' => sub { my $_ = 'abc123def'; s/\d+/$&*2/e }, '()' => sub { my $_ = 'abc123def'; s/(\d+)/$1*2/e }, }); </code></pre> <p>On my MacBook Air running Leopard and a vanilla Perl 5.10:</p> <pre><code> Rate /p () control-e control /p 70621/s -- -1% -58% -78% () 71124/s 1% -- -58% -78% control-e 168350/s 138% 137% -- -48% control 322581/s 357% 354% 92% -- </code></pre> <p>Notice the big slowdown with the <code>/e</code> option, which I've added just for giggles. </p> <p>Now, I'll uncomment the <code>$&</code> branch, and I see that everything is slower, although <code>/p</code> seems to shihe here:</p> <pre><code> Rate () $& /p control-e control () 68353/s -- -4% -7% -58% -74% $& 70872/s 4% -- -3% -56% -73% /p 73421/s 7% 4% -- -54% -72% control-e 161290/s 136% 128% 120% -- -39% control 262467/s 284% 270% 257% 63% -- </code></pre> <p>This is an odd benchmark. If I don't include the <code>control-e</code> sub, the situation looks different, which demonstrates another concept of benchmarking: it's not absolute and everything that you do matters in the final results. In this run, <code>$&</code> looks slightly faster:</p> <pre><code> Rate () /p $& control () 69686/s -- -3% -3% -72% /p 72098/s 3% -- -0% -71% $& 72150/s 4% 0% -- -71% control 251256/s 261% 248% 248% -- </code></pre> <p>So, I ran it with <code>control-e</code> again, and the results move around a little:</p> <pre><code> Rate () /p $& control-e control () 68306/s -- -3% -4% -55% -74% /p 70175/s 3% -- -1% -54% -73% $& 71023/s 4% 1% -- -53% -73% control-e 151976/s 122% 117% 114% -- -41% control 258398/s 278% 268% 264% 70% -- </code></pre> <p>The speed differences in each aren't impressive either. Anything under about 7% isn't that significant since that difference comes the accumulation of errors through the repeated calls to the sub (try it sometime by benchmarking the same code against itself). The slight differences you see come merely from the benchmarking infrastructure. With these numbers, each technique is virtually the same speedwise. You can't just run your benchmark once. You have to run it several times to see if you get repeatable results.</p> <p>Note that although the <code>/p</code> looks very slightly slower, it's also slower because <code>$&</code> cheats by messing up everyone. Notice the slow down in the control too. This is one of the reasons that benchmarking is so dangerous. You can easily mislead yourself with the results if you don't think hard about why they are wrong (see the full screed in <a href="http://oreilly.com/catalog/9780596527242/" rel="nofollow noreferrer">Mastering Perl</a>, where I devote an entire chapter to this.)</p> <p>This simple and naïve benchmark excludes the killer disfeature of <code>$&</code>, though. Let's modify the benchmark to handle an additional match. First, the baseline with no <code>$&</code> effects, where I've constructed a situation where <code>$&</code> would have to copy about 1,000 characters in an additional match operator:</p> <pre><code>use 5.010; use Benchmark qw(cmpthese); $main::long = ( 'a' x 1_000 ) . '123' . ( 'b' x 1_000 ); cmpthese(1_000_000, { 'control' => sub { my $_ = 'abc123def'; s/\d+/246/; $main::long =~ m/^a+123/; }, 'control-e' => sub { my $_ = 'abc123def'; s/\d+/123*2/e; $main::long =~ m/^a+123/; }, '/p' => sub { my $_ = 'abc123def'; s/\d+/${^MATCH}*2/pe; $main::long =~ m/^a+123/; }, #'$&' => sub { my $_ = 'abc123def'; s/\d+/$&*2/e; $main::long =~ m/^a+123/;}, '()' => sub { my $_ = 'abc123def'; s/(\d+)/$1*2/e; $main::long =~ m/^a+123/; }, }); </code></pre> <p>Everything is much slower than before, but that's what happens when you do more work, and again the two techniques are within each other's noise:</p> <pre><code> Rate () /p control-e control () 52826/s -- -4% -49% -63% /p 54885/s 4% -- -47% -61% control-e 103734/s 96% 89% -- -27% control 141243/s 167% 157% 36% -- </code></pre> <p>Now, I uncomment the <code>$&</code> sub:</p> <pre><code> Rate () $& /p control-e control () 50607/s -- -1% -3% -43% -59% $& 50968/s 1% -- -2% -43% -58% /p 52274/s 3% 3% -- -41% -57% control-e 89206/s 76% 75% 71% -- -27% control 122100/s 141% 140% 134% 37% -- </code></pre> <p>That result is very interesting. Now <code>/p</code>, still penalized by the cheating <code>$&</code>, is slightly faster (although still within the noise), although everyone suffers significantly.</p> <p>Again, be very careful with these results. This does not mean that for every script, <code>$&</code> will have the same effect. You might seem less of a slowdown, or more of it, depending on the number of matches, the particular regexes, and so on. What this, or any, benchmark shows is an idea, not a decision. You still have to figure out how this idea affects your particular situation.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload