Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <h1>Named Capture Examples</h1> <blockquote> <p>Can you think of some examples where the named pattern are very useful or not useful at all? </p> </blockquote> <p>In answer to your question, here is an example where named patterns are especially useful. It’s a Perl or PCRE pattern for parsing an RFC 5322 mail address. First, it’s in <code>/x</code> mode by virtue of <code>(?x)</code>. Second, it separates out the definitions from the invocation; the named group <code>address</code> is the thing that does the full recursive-descent parse. Its definition follows it in the non-executing <code>(?DEFINE)…)</code> block.</p> <pre><code> (?x) # allow whitespace and comments (?&amp;address) # this is the capture we call as a "regex subroutine" # the rest is all definitions, in a nicely BNF-style (?(DEFINE) (?&lt;address&gt; (?&amp;mailbox) | (?&amp;group)) (?&lt;mailbox&gt; (?&amp;name_addr) | (?&amp;addr_spec)) (?&lt;name_addr&gt; (?&amp;display_name)? (?&amp;angle_addr)) (?&lt;angle_addr&gt; (?&amp;CFWS)? &lt; (?&amp;addr_spec) &gt; (?&amp;CFWS)?) (?&lt;group&gt; (?&amp;display_name) : (?:(?&amp;mailbox_list) | (?&amp;CFWS))? ; (?&amp;CFWS)?) (?&lt;display_name&gt; (?&amp;phrase)) (?&lt;mailbox_list&gt; (?&amp;mailbox) (?: , (?&amp;mailbox))*) (?&lt;addr_spec&gt; (?&amp;local_part) \@ (?&amp;domain)) (?&lt;local_part&gt; (?&amp;dot_atom) | (?&amp;quoted_string)) (?&lt;domain&gt; (?&amp;dot_atom) | (?&amp;domain_literal)) (?&lt;domain_literal&gt; (?&amp;CFWS)? \[ (?: (?&amp;FWS)? (?&amp;dcontent))* (?&amp;FWS)? \] (?&amp;CFWS)?) (?&lt;dcontent&gt; (?&amp;dtext) | (?&amp;quoted_pair)) (?&lt;dtext&gt; (?&amp;NO_WS_CTL) | [\x21-\x5a\x5e-\x7e]) (?&lt;atext&gt; (?&amp;ALPHA) | (?&amp;DIGIT) | [!#\$%&amp;'*+-/=?^_`{|}~]) (?&lt;atom&gt; (?&amp;CFWS)? (?&amp;atext)+ (?&amp;CFWS)?) (?&lt;dot_atom&gt; (?&amp;CFWS)? (?&amp;dot_atom_text) (?&amp;CFWS)?) (?&lt;dot_atom_text&gt; (?&amp;atext)+ (?: \. (?&amp;atext)+)*) (?&lt;text&gt; [\x01-\x09\x0b\x0c\x0e-\x7f]) (?&lt;quoted_pair&gt; \\ (?&amp;text)) (?&lt;qtext&gt; (?&amp;NO_WS_CTL) | [\x21\x23-\x5b\x5d-\x7e]) (?&lt;qcontent&gt; (?&amp;qtext) | (?&amp;quoted_pair)) (?&lt;quoted_string&gt; (?&amp;CFWS)? (?&amp;DQUOTE) (?:(?&amp;FWS)? (?&amp;qcontent))* (?&amp;FWS)? (?&amp;DQUOTE) (?&amp;CFWS)?) (?&lt;word&gt; (?&amp;atom) | (?&amp;quoted_string)) (?&lt;phrase&gt; (?&amp;word)+) # Folding white space (?&lt;FWS&gt; (?: (?&amp;WSP)* (?&amp;CRLF))? (?&amp;WSP)+) (?&lt;ctext&gt; (?&amp;NO_WS_CTL) | [\x21-\x27\x2a-\x5b\x5d-\x7e]) (?&lt;ccontent&gt; (?&amp;ctext) | (?&amp;quoted_pair) | (?&amp;comment)) (?&lt;comment&gt; \( (?: (?&amp;FWS)? (?&amp;ccontent))* (?&amp;FWS)? \) ) (?&lt;CFWS&gt; (?: (?&amp;FWS)? (?&amp;comment))* (?: (?:(?&amp;FWS)? (?&amp;comment)) | (?&amp;FWS))) # No whitespace control (?&lt;NO_WS_CTL&gt; [\x01-\x08\x0b\x0c\x0e-\x1f\x7f]) (?&lt;ALPHA&gt; [A-Za-z]) (?&lt;DIGIT&gt; [0-9]) (?&lt;CRLF&gt; \x0d \x0a) (?&lt;DQUOTE&gt; ") (?&lt;WSP&gt; [\x20\x09]) ) </code></pre> <p>I strongly suggest not reïnventing a perfectly good wheel. Start with becoming PCRE-compatible. If you wish to go beyond basic Perl5 patterns like the RFC5322-parser above, there’s always <a href="http://perlcabal.org/syn/S05.html" rel="nofollow">Perl6 patterns</a> to draw upon.</p> <p>It <strong>really, really</strong> pays to do research into existing practice and literature before haring off on an open-ended R&amp;D mission. These problems have all long ago been solved, sometimes quite elegantly.</p> <h1>Improving Java Regex Syntax</h1> <p>If you truly want better regex syntax ideas for Java, you must first address these particular flaws in Java’s regexes:</p> <ol> <li>Lack of multiline pattern strings, as demonstrated above.</li> <li>Freedom from insanely onerous and error-prone double-backslashing, as also demonstrated above.</li> <li>Lack of compile-time exceptions on invalid regex literals, and lack of compile-time caching of correctly compiled regex literals.</li> <li>Impossible to change something like <code>"foo".matches(pattern)</code> to use a better pattern library, partly but not solely because of <code>final</code> classes that are not overridable. </li> <li>No debugging or profiling facilities.</li> <li>Lack of compliance with <a href="http://unicode.org/reports/tr18/" rel="nofollow">UTS#18: Basic Regular Expression support</a>, the very most elementary steps necessary to make Java regexes useful for Unicode. They currently are not. They don’t even support Unicode 3.1 properties from a decade ago, which means you cannot use Java patterns for Unicode in any reasonable fashion; the basic building blocks are absent.</li> </ol> <p>Of these, the first 3 have been addressed in several JVM languages, including both Groovy and Scala; even Clojure goes part-way there.</p> <p>The second set of 3 steps will be tougher, but are absolutely mandatory. The last one, the absence of even the most basic Unicode support in regexes, simply kills Java for Unicode work. This is complety inexcusable this late in the game. I can provide plenty of examples if need be, but you should trust me, because I really do know what I’m talking about here.</p> <p>Only once you have accomplished all these should you be worried about fixing up Java’s regexes so they can catch up with the current state of the art in pattern matching. Until and unless you take care of these past oversights, you can’t begin to look to the present, let alone to the future.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload