StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>A simple, nontechnical explanation, paraphrased from Jeffrey Friedl's book <a href="http://oreilly.com/catalog/9780596528126/" rel="nofollow">Mastering Regular Expressions</a>. </p> <p><strong>CAVEAT</strong>:</p> <p>While this book is generally considered the "regex bible", there appears some controversy as to whether the distinction made here between DFA and NFA is actually correct. I'm not a computer scientist, and I don't understand most of the theory behind what really is a "regular" expression, deterministic or not. After the controversy started, I deleted this answer because of this, but since then it has been referenced in comments to other answers. I would be very interested in discussing this further - can it be that Friedl really is wrong? Or did I get Friedl wrong (but I reread that chapter yesterday evening, and it's just like I remembered...)? </p> <p><strong>Edit:</strong> It appears that Friedl and I are indeed wrong. Please check out Eamon's excellent comments below.</p> <hr> <p><strong>Original answer:</strong></p> <p>A <strong>DFA engine</strong> steps through the <em>input string</em> character by character and tries (and remembers) all possible ways the regex could match at this point. If it reaches the end of the string, it declares success. </p> <p>Imagine the string <code>AAB</code> and the regex <code>A*AB</code>. We now step through our string letter by letter.</p> <ol> <li><p><code>A</code>: </p> <ul> <li>First branch: Can be matched by <code>A*</code>.</li> <li>Second branch: Can be matched by ignoring the <code>A*</code> (zero repetitions are allowed) and using the second <code>A</code> in the regex.</li> </ul></li> <li><p><code>A</code>:</p> <ul> <li>First branch: Can be matched by expanding <code>A*</code>.</li> <li>Second branch: Can't be matched by <code>B</code>. Second branch fails. But:</li> <li>Third branch: Can be matched by not expanding <code>A*</code> and using the second <code>A</code> instead.</li> </ul></li> <li><p><code>B</code>:</p> <ul> <li>First branch: Can't be matched by expanding <code>A*</code> or by moving on in the regex to the next token <code>A</code>. First branch fails.</li> <li>Third branch: Can be matched. Hooray!</li> </ul></li> </ol> <p>A DFA engine never backtracks in the string. </p> <hr> <p>An <strong>NFA engine</strong> steps through the <em>regex</em> token by token and tries all possible permutations on the string, backtracking if necessary. If it reaches the end of the regex, it declares success.</p> <p>Imagine the same string and the same regex as before. We now step through our regex token by token:</p> <ol> <li><code>A*</code>: Match <code>AA</code>. Remember the backtracking positions 0 (start of string) and 1.</li> <li><code>A</code>: Doesn't match. But we have a backtracking position we can return to and try again. The regex engine steps back one character. Now <code>A</code> matches. </li> <li><code>B</code>: Matches. End of regex reached (with one backtracking position to spare). Hooray!</li> </ol>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload