Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Assuming that your regex engine supports lookbehinds, atomic groups and possessive quantifiers (that are PCRE features):</p> <h3>Some examples of what can be replaced:</h3> <ul> <li><p>all <code>(?:</code> by <code>(?&gt;</code></p></li> <li><p>the begining <i>(all the first named group)</i> by:</p> <p><code>^(?P&lt;initial&gt;(?&gt;[csz]h?+|[bdfghj-npqrtwxy])?)</code></p></li> <li><p>this part* by:</p> <p><code>|(?&lt;![csz]h)(?&lt;=h)(?&gt;a(?&gt;[io]|ng?+)?|e(?&gt;i|ng?+)?|o(?&gt;u|ng)|u(?&gt;[ino]|a(?&gt;i|ng?+)?)?)</code> </p></li> </ul> <p><i>*( ie: <code>|(?:(?&lt;!sh|ch|zh)(?&lt;=h)uang|(?&lt;!sh|ch|...|(?&lt;!sh|ch|zh)(?&lt;=h)u)</code> )</i></p> <ul> <li><p>the last part* by:</p> <p><code>|(?&lt;![bcdfghj-np-tw-z])(?&gt;a(?&gt;[io]|ng?+)?|e(?&gt;[ir]|ng?+)?|ou?+))$</code></p></li> </ul> <p><i>*( ie:<code>|(?:(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)a|(?&lt;!r|c|b|d|...))$</code> )</i></p> <h3>How to deal with the other parts:</h3> <p>example:</p> <pre><code>(?:(?&lt;=ch)uang|(?&lt;=ch)ang|(?&lt;=ch)eng|(?&lt;=ch)ong|(?&lt;=ch)uai|(?&lt;=ch)uan|(?&lt;=ch)ai|(?&lt;=ch)an|(?&lt;=ch)ao|(?&lt;=ch)en|(?&lt;=ch)ou|(?&lt;=ch)ua|(?&lt;=ch)ui|(?&lt;=ch)un|(?&lt;=ch)uo|(?&lt;=ch)a|(?&lt;=ch)e|(?&lt;=ch)i|(?&lt;=ch)u) </code></pre> <p>_ all this kind of parts has the same lookbehind, you must do these steps for each _ </p> <pre><code># step 1: lookarounds factorization (?&lt;=ch)(?&gt;ang|eng|ong|uai|uan|ai|an|ao|en|ou|ua|ui|un|uo|a|e|i|u) # step 2: sort all the content by alphabetic order (?&lt;=ch)(?&gt;a|ai|an|ang|ao|e|en|eng|i|ong|ou|u|ua|uai|uan|ui|un|uo) # step 3: group by first letter: don't forget the ? if the letter can be alone (?&lt;=ch)(?&gt;a(?&gt;i|n|ng|o)?|e(?&gt;n|ng)?|i|o(?&gt;ng|u)|u(?&gt;a|ai|an|i|n|o)?) # step 4: reduce the terminations (ie: n &amp; ng =&gt; ng?+) (?&lt;=ch)(?&gt;a(?&gt;i|ng?+|o)?|e(?&gt;ng?+)?|i|o(?&gt;ng|u)|u(?&gt;a[in]?+|i|n|o)?) # step 5: put single letters in a character class (?&lt;=ch)(?&gt;a(?&gt;[io]|ng?+)?|e(?&gt;ng?+)?|i|o(?&gt;ng|u)|u(?&gt;a[in]?+|[ino])?) </code></pre> <h3>conclusion</h3> <p>Although the result is shorter, the goal here is optimization. I reduced the number of tests with the factorization and the number of backtracks using atomic groups and possessive quantifiers.</p> <p><strong>some limitations</strong></p> <p>Note that regex features like <a href="http://www.regular-expressions.info/atomic.html" rel="nofollow noreferrer">atomic groups</a> and <a href="http://www.regular-expressions.info/possessive.html" rel="nofollow noreferrer">possessive quantifiers</a> are not supported by all regex flavors, but it is possible to remedy the problem: </p> <ul> <li>for flavors that don't support possessive quantifiers: change <code>?+</code> to <code>?</code></li> <li>for flavors that don't support atomic groups: change <code>(?&gt;</code> to <code>(?:</code></li> </ul> <p><i>(Note that there is a trick to have atomic groups with Python, which you may test with a timer, to surround all the pattern. See this incredible post: <a href="https://stackoverflow.com/questions/13577372/do-python-regular-expressions-have-an-equivalent-to-rubys-atomic-grouping">Do Python regular expressions have an equivalent to Ruby&#39;s atomic grouping?</a>)</i></p> <p>Some regex engines such as javascript do not support lookbehinds. In this case, you must rewrite all your pattern using only alternations (ie <code>|</code>), which isn't a bad thing, since lookbehinds make your pattern slower; and give up the named captures that are not supported too. <i>(In this context, it should be noted that to remove negative lookbehinds you need to put syllables described in these parts before all others so that they are matched first.)</i></p> <p><strong>other ways of optimization</strong></p> <ul> <li>rewrite your pattern without lookbehinds and with <code>|</code> instead</li> <li>sort the different lines by the most used syllables</li> </ul>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload