StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Since Java supports variable-length look-behinds (as long as they are finite), you could do do it like this:</p> <pre><code>import java.util.regex.*; public class RegexTest { public static void main(String[] argv) { Pattern p = Pattern.compile("(?<=(?<!\\\\)(?:\\\\\\\\){0,10}):"); String text = "foo:bar\\:baz\\\\:qux\\\\\\:quux\\\\\\\\:corge"; String[] parts = p.split(text); System.out.printf("Input string: %s\n", text); for (int i = 0; i < parts.length; i++) { System.out.printf("Part %d: %s\n", i+1, parts[i]); } } } </code></pre> <ul> <li><code>(?<=(?<!\\)(?:\\\\){0,10})</code> looks behind for an even number of back-slashes (including zero, up to a maximum of 10).</li> </ul> <p>Output:</p> <blockquote> <p><code>Input string: foo:bar\:baz\\:qux\\\:quux\\\\:corge</code><br> <code>Part 1: foo</code><br> <code>Part 2: bar\:baz\\</code><br> <code>Part 3: qux\\\:quux\\\\</code><br> <code>Part 4: corge</code> </p> </blockquote> <p>Another way would be to match the parts themselves, instead of split at the delimiters.</p> <pre><code>Pattern p2 = Pattern.compile("(?<=\\A|\\G:)((?:\\\\.|[^:\\\\])*)"); List<String> parts2 = new LinkedList<String>(); Matcher m = p2.matcher(text); while (m.find()) { parts2.add(m.group(1)); } </code></pre> <p>The strange syntax stems from that it need to handle the case of empty pieces at the start and end of the string. When a match spans exactly zero characters, the next attempt will start one character past the end of it. If it didn't, it would match another empty string, and another, ad infinitum…</p> <ul> <li><code>(?<=\A|\G:)</code> will look behind for either the start of the string (the first piece), or the end of the previous match, followed by the separator. If we did <code>(?:\A|\G:)</code>, it would fail if the first piece is empty (input starts with a separator).</li> <li><code>\\.</code> matches any escaped character.</li> <li><code>[^:\\]</code> matches any character that is not in an escape sequence (because <code>\\.</code> consumed both of those).</li> <li><code>((?:\\.|[^:\\])*)</code> captures all characters up until the first non-escaped delimiter into capture-group 1.</li> </ul>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload