StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>First, a disclaimer: Any attempt to slice and dice XML with regular expressions is fragile; a real XML parser would do better.</p> <p>The pattern:</p> <pre><code>\(<Annotation\(\s*\w\+="[^"]\{-}"\s\{-}\)*>\)\@<=\(\(<\/Annotation\)\@!\_.\)\{-}"MATCH\_.\{-}\(<\/Annotation>\)\@= </code></pre> <p>Let's break it down...</p> <p>Group 1 is <code><Annotation\(\s*\w\+="[^"]\{-}"\s\{-}\)*></code>. It matches the start-tag of the Attribute element. Group 2, which is embedded in Group 1, matches an attribute and may be repeated 0 or more times.</p> <p>Group 2 is <code>\s*\w\+="[^"]\{-}"\s\{-}</code>. Most of these pieces are commonly used; the most unusual is <code>\{-}</code>, which means non-greedy repetition (<code>*?</code> in Perl-compatible regular expressions). The non-greedy whitespace match at the end is important for performance; without it, Vim will try every possible way to split the whitespace between attributes between the <code>\s*</code> at the end of Group 2 and the <code>\s*</code> at the beginning of the next occurrence of Group 2.</p> <p>Group 1 is followed by <code>\@<=</code>. This is a zero-width positive look-behind. It prevents the start-tag from being included in the matched text (e.g., for s///).</p> <p>Group 3 is <code>\(<\/Annotation\)\@!\_.</code>. It includes Group 4, which matches the beginning of the Attribute end-tag. The <code>\@!</code> is a zero-width negative look-ahead and <code>\_.</code> matches any character (including newlines). Together, this groups matches at any character except where the Attribute end-tag starts. Group 3 is followed by a non-greedy repetition marker <code>\{-}</code> so that it matches the smallest block of text before MATCH. If you were to use <code>\_.</code> instead of Group 3, the matched text could include the end-tag of an Annotation element that did <em>not</em> include MATCH and continue through into the next Annotation element with MATCH. (Try it.)</p> <p>The next bit is straightforward: Find MATCH and a minimal number of other characters before the end-tag.</p> <p>Group 5 is easy: It's the end tag. <code>\@=</code> is a zero-width positive look-ahead, which is included here for the same reason as the <code>\@<=</code> for the start-tag. We have to repeat <code><\/Attribute</code> rather than use <code>\4</code> because groups with zero-width modifiers aren't captured.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload