Note that there are some explanatory texts on larger screens.

plurals
  1. POCapturing <thisPartOnly> and (thisPartOnly) with the same group
    primarykey
    data
    text
    <p>Let's say we have the following input:</p> <pre><code>&lt;amy&gt; (bob) &lt;carol) (dean&gt; </code></pre> <p>We also have the following regex:</p> <pre><code>&lt;(\w+)&gt;|\((\w+)\) </code></pre> <p>Now we get two matches (<a href="http://www.rubular.com/r/nfwk7d5YRG" rel="nofollow noreferrer">as seen on rubular.com</a>):</p> <ul> <li><code>&lt;amy&gt;</code> is a match, <code>\1</code> captures <code>amy</code>, <code>\2</code> fails</li> <li><code>(bob)</code> is a match, <code>\2</code> captures <code>bob</code>, <code>\1</code> fails</li> </ul> <p>This regex does most of what we want, which are:</p> <ul> <li>It matches the open and close brackets properly (i.e. no mixing)</li> <li>It captures the part we're interested in</li> </ul> <p>However, it does have a few drawbacks:</p> <ul> <li>The capturing pattern (i.e. the "main" part) is repeated <ul> <li>It's only <code>\w+</code> in this case, but generally speaking this can be quite complex, <ul> <li>If it involves backreferences, then they must be renumbered for each alternate!</li> <li>Repetition makes maintenance a nightmare! (what if it changes?)</li> </ul></li> </ul></li> <li>The groups are essentially duplicated <ul> <li>Depending on which alternate matches, we must query different groups <ul> <li>It's only <code>\1</code> or <code>\2</code> in this case, but generally the "main" part can have capturing groups of their own!</li> </ul></li> <li>Not only is this inconvenient, but there may be situations where this is not feasible (e.g. when we're using a custom regex framework that is limited to querying only one group)</li> </ul></li> <li>The situation quickly worsens if we also want to match <code>{...}</code>, <code>[...]</code>, etc.</li> </ul> <p>So the question is obvious: <strong>how can we do this without repeating the "main" pattern?</strong></p> <blockquote> <p>Note: for the most part I'm interested in <code>java.util.regex</code> flavor, but other flavors are welcomed.</p> </blockquote> <hr> <h3>Appendix</h3> <p>There's nothing new in this section; it only illustrates the problem mentioned above with an example.</p> <p>Let's take the above example to the next step: we now want to match these:</p> <pre><code>&lt;amy=amy&gt; (bob=bob) [carol=carol] </code></pre> <p>But not these:</p> <pre><code>&lt;amy=amy) # non-matching bracket &lt;amy=bob&gt; # left hand side not equal to right hand side </code></pre> <p>Using the alternate technique, we have the following that works (<a href="http://www.rubular.com/r/ojoknCda2A" rel="nofollow noreferrer">as seen on rubular.com</a>):</p> <pre><code>&lt;((\w+)=\2)&gt;|\(((\w+)=\4)\)|\[((\w+)=\6)\] </code></pre> <p>As explained above:</p> <ul> <li>The main pattern can't simply be repeated; backreferences must be renumbered</li> <li>Repetition also means maintenance nightmare if it ever changes</li> <li>Depending on which alternate matches, we must query either <code>\1 \2</code>, <code>\3 \4</code>, or <code>\5 \6</code></li> </ul>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload