Note that there are some explanatory texts on larger screens.

plurals
  1. POCan DFA regex engines handle atomic groups?
    text
    copied!<p>According to <a href="http://useless-factor.blogspot.de/2008/05/regexp-research.html" rel="nofollow">this page</a> (and some others), DFA regex engines can deal with capturing groups rather well. I'm curious about atomic groups (or possessive quantifiers), as I recently used them a lot and can't imagine how this could be done.</p> <hr> <p>I disagree with the fist part of the answer:</p> <blockquote> <p>A DFA does not need to deal with constructs like atomic grouping.... Atomic Grouping is a way to help the engine finish a match, that would otherwise cause endless backtracking</p> </blockquote> <p>Atomic groups are important not only for speed of NFA engines, but they also allow to write simpler and less error-prone regexes. Let's say I needed to find all C-style multiline comments in a program. The exact regex would be something like:</p> <ul> <li>start with the literal <code>/*</code></li> <li>eat anything of the following <ul> <li>any char except <code>*</code></li> <li>a <code>*</code> followed by anything but <code>/</code></li> </ul></li> <li>repeat this as much as possible</li> <li>end with the literal <code>*/</code></li> </ul> <p>This sounds a bit complicated, the regex</p> <pre><code>/\* ( [^*] | \*[^/] )+ \*/ </code></pre> <p>is complicated and wrong (it doesn't handle <code>/* foo **/</code> correctly). Using a reluctant (lazy) quantifier is better</p> <pre><code>/\* .*? \*/ </code></pre> <p>but also wrong as it can eat the whole line</p> <pre><code>/* foo */ @#$!!**@#$ /* bar */ </code></pre> <p>when backtracking due to a later sub-expression failing on the garbage occurs. Putting the above in an atomic group solves the problem nicely:</p> <pre><code>(?&gt; /\* .*? \*/ ) </code></pre> <p>This works always (I hope) and is as fast as possible (for NFA). So I wonder if a DFA engine could somehow handle it.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload