StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>The question is not why the fourth test hangs, but why the first three don't. The first string starts with a space, and the second starts with <code>Contacts</code>, neither of which matches the regex <code>^Line</code>, so the first two match attempts fail immediately. The third string matches the regex; although it takes much longer than it should (for reasons I'm about to explain), it still seems instantaneous. </p> <p>The fourth match fails because the string doesn't match the end part of the regex: <code>tst0063$</code>. When that fails, the regex engine backs up to the variable portion of the regex, <code>(\s*\S*)*</code>, and starts trying all the different ways to fit that onto the string. Unlike the third string, this time it has to try every every possible combination of zero or more whitespace characters (<code>\s*</code>) followed by zero or more non-whitespace characters (<code>\S*</code>), zero or more times, before it can give up. The possibilities aren't infinite, but they might as well be.</p> <p>You were probably thinking of <code>[\s\S]*</code>, which is a well-known idiom for matching any character <em>including newlines</em>. It's used in JavaScript, which doesn't have a way to make the dot (<code>.</code>) match line separator characters. Most other flavors let you specify a matching mode that changes the behavior of the dot; some call it <em>DOTALL</em> mode, but .NET uses the more common <em>Singleline</em>.</p> <pre><code>string sPattern = @"^Line.*tst0063$"; RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.Singleline; </code></pre> <p>You can also use <em>inline</em> modifiers:</p> <pre><code>string sPattern = @"(?is)^Line.*tst0063$"; </code></pre> <p><strong>UPDATE:</strong> In response to your comment, yes, it does seem odd that the regex engine can't tell that any match <em>must</em> end with <code>tst0063</code>. But it's not always so easy to tell. How much effort should it put into looking for shortcuts like that? And how many shortcuts can you bolt onto the normal matching algorithm before <em>all</em> matches (successful as well as failed) become too slow?</p> <p>.NET has one of the best regex implementations out there: fast, powerful, and with some truly amazing features. But you have to think about what you're telling it to do. For example, if you know there has to be at least one of something, use <code>+</code>, not <code>*</code>. If you had followed that rule, you wouldn't have had this problem. This regex:</p> <pre><code>@"^Line(\s+\S+)*tst0063$" </code></pre> <p>...works just fine. <code>(\s+\S+)*</code> is a perfectly reasonable way to match zero or more words, where words are defined as one or more non-whitespace characters, separated from other words by one or more whitespace characters. (Is that what you were trying to do?)</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload