Note that there are some explanatory texts on larger screens.

plurals
  1. POOptimizing a regular expression to parse chinese pinyin
    primarykey
    data
    text
    <p>I have a working regular expression to parse PinYin which matches every valid PinYin and does not match invalid ones. I am wondering how I can optimize it.</p> <pre><code>^(?P&lt;initial&gt;ch|zh|sh|r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z|) (?P&lt;final&gt; (?:(?&lt;=ch)uang|(?&lt;=ch)ang|(?&lt;=ch)eng|(?&lt;=ch)ong|(?&lt;=ch)uai|(?&lt;=ch)uan|(?&lt;=ch)ai|(?&lt;=ch)an|(?&lt;=ch)ao|(?&lt;=ch)en|(?&lt;=ch)ou|(?&lt;=ch)ua|(?&lt;=ch)ui|(?&lt;=ch)un|(?&lt;=ch)uo|(?&lt;=ch)a|(?&lt;=ch)e|(?&lt;=ch)i|(?&lt;=ch)u) |(?:(?&lt;=zh)uang|(?&lt;=zh)ang|(?&lt;=zh)eng|(?&lt;=zh)ong|(?&lt;=zh)uai|(?&lt;=zh)uan|(?&lt;=zh)ai|(?&lt;=zh)an|(?&lt;=zh)ao|(?&lt;=zh)ei|(?&lt;=zh)en|(?&lt;=zh)ou|(?&lt;=zh)ua|(?&lt;=zh)ui|(?&lt;=zh)un|(?&lt;=zh)uo|(?&lt;=zh)a|(?&lt;=zh)e|(?&lt;=zh)i|(?&lt;=zh)u) |(?:(?&lt;=sh)uang|(?&lt;=sh)ang|(?&lt;=sh)eng|(?&lt;=sh)uai|(?&lt;=sh)uan|(?&lt;=sh)ai|(?&lt;=sh)an|(?&lt;=sh)ao|(?&lt;=sh)ei|(?&lt;=sh)en|(?&lt;=sh)ou|(?&lt;=sh)ua|(?&lt;=sh)ui|(?&lt;=sh)un|(?&lt;=sh)uo|(?&lt;=sh)a|(?&lt;=sh)e|(?&lt;=sh)i|(?&lt;=sh)u) |(?:(?&lt;=c)ang|(?&lt;=c)eng|(?&lt;=c)ong|(?&lt;=c)uan|(?&lt;=c)ai|(?&lt;=c)an|(?&lt;=c)ao|(?&lt;=c)en|(?&lt;=c)ou|(?&lt;=c)ui|(?&lt;=c)un|(?&lt;=c)uo|(?&lt;=c)a|(?&lt;=c)e|(?&lt;=c)i|(?&lt;=c)u) |(?:(?&lt;=b)ang|(?&lt;=b)eng|(?&lt;=b)ian|(?&lt;=b)iao|(?&lt;=b)ing|(?&lt;=b)ai|(?&lt;=b)an|(?&lt;=b)ao|(?&lt;=b)ei|(?&lt;=b)en|(?&lt;=b)ie|(?&lt;=b)in|(?&lt;=b)a|(?&lt;=b)i|(?&lt;=b)o|(?&lt;=b)u) |(?:(?&lt;=d)ang|(?&lt;=d)eng|(?&lt;=d)ian|(?&lt;=d)iao|(?&lt;=d)ing|(?&lt;=d)ong|(?&lt;=d)uan|(?&lt;=d)ai|(?&lt;=d)an|(?&lt;=d)ao|(?&lt;=d)ei|(?&lt;=d)en|(?&lt;=d)ia|(?&lt;=d)ie|(?&lt;=d)iu|(?&lt;=d)ou|(?&lt;=d)ui|(?&lt;=d)un|(?&lt;=d)uo|(?&lt;=d)a|(?&lt;=d)e|(?&lt;=d)i|(?&lt;=d)u) |(?:(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)a|(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)ai |(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)an|(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)ang |(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)ao|(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)e </code></pre> <p>Above is an abbreviated version for sake of readability. The whole expression can be found at the end of this post.</p> <p>I am specifically wondering if passing in two or more prefixes to an ending matcher would improve perfomance:</p> <pre><code> (&lt;=ch|zh|sh)uang|(&lt;=ch|zh|sh)ang... </code></pre> <p>Thanks for your time and suggestions.</p> <p>whole regex:</p> <pre><code> ^(?P&lt;initial&gt;ch|zh|sh|r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z|)(?P&lt;final&gt;(?:(?&lt;=ch)uang|(?&lt;=ch)ang|(?&lt;=ch)eng|(?&lt;=ch)ong|(?&lt;=ch)uai|(?&lt;=ch)uan|(?&lt;=ch)ai|(?&lt;=ch)an|(?&lt;=ch)ao|(?&lt;=ch)en|(?&lt;=ch)ou|(?&lt;=ch)ua|(?&lt;=ch)ui|(?&lt;=ch)un|(?&lt;=ch)uo|(?&lt;=ch)a|(?&lt;=ch)e|(?&lt;=ch)i|(?&lt;=ch)u)|(?:(?&lt;=zh)uang|(?&lt;=zh)ang|(?&lt;=zh)eng|(?&lt;=zh)ong|(?&lt;=zh)uai|(?&lt;=zh)uan|(?&lt;=zh)ai|(?&lt;=zh)an|(?&lt;=zh)ao|(?&lt;=zh)ei|(?&lt;=zh)en|(?&lt;=zh)ou|(?&lt;=zh)ua|(?&lt;=zh)ui|(?&lt;=zh)un|(?&lt;=zh)uo|(?&lt;=zh)a|(?&lt;=zh)e|(?&lt;=zh)i|(?&lt;=zh)u)|(?:(?&lt;=sh)uang|(?&lt;=sh)ang|(?&lt;=sh)eng|(?&lt;=sh)uai|(?&lt;=sh)uan|(?&lt;=sh)ai|(?&lt;=sh)an|(?&lt;=sh)ao|(?&lt;=sh)ei|(?&lt;=sh)en|(?&lt;=sh)ou|(?&lt;=sh)ua|(?&lt;=sh)ui|(?&lt;=sh)un|(?&lt;=sh)uo|(?&lt;=sh)a|(?&lt;=sh)e|(?&lt;=sh)i|(?&lt;=sh)u)|(?:(?&lt;=c)ang|(?&lt;=c)eng|(?&lt;=c)ong|(?&lt;=c)uan|(?&lt;=c)ai|(?&lt;=c)an|(?&lt;=c)ao|(?&lt;=c)en|(?&lt;=c)ou|(?&lt;=c)ui|(?&lt;=c)un|(?&lt;=c)uo|(?&lt;=c)a|(?&lt;=c)e|(?&lt;=c)i|(?&lt;=c)u)|(?:(?&lt;=b)ang|(?&lt;=b)eng|(?&lt;=b)ian|(?&lt;=b)iao|(?&lt;=b)ing|(?&lt;=b)ai|(?&lt;=b)an|(?&lt;=b)ao|(?&lt;=b)ei|(?&lt;=b)en|(?&lt;=b)ie|(?&lt;=b)in|(?&lt;=b)a|(?&lt;=b)i|(?&lt;=b)o|(?&lt;=b)u)|(?:(?&lt;=d)ang|(?&lt;=d)eng|(?&lt;=d)ian|(?&lt;=d)iao|(?&lt;=d)ing|(?&lt;=d)ong|(?&lt;=d)uan|(?&lt;=d)ai|(?&lt;=d)an|(?&lt;=d)ao|(?&lt;=d)ei|(?&lt;=d)en|(?&lt;=d)ia|(?&lt;=d)ie|(?&lt;=d)iu|(?&lt;=d)ou|(?&lt;=d)ui|(?&lt;=d)un|(?&lt;=d)uo|(?&lt;=d)a|(?&lt;=d)e|(?&lt;=d)i|(?&lt;=d)u)|(?:(?&lt;=g)uang|(?&lt;=g)ang|(?&lt;=g)eng|(?&lt;=g)ong|(?&lt;=g)uai|(?&lt;=g)uan|(?&lt;=g)ai|(?&lt;=g)an|(?&lt;=g)ao|(?&lt;=g)ei|(?&lt;=g)en|(?&lt;=g)ou|(?&lt;=g)ua|(?&lt;=g)ui|(?&lt;=g)un|(?&lt;=g)uo|(?&lt;=g)a|(?&lt;=g)e|(?&lt;=g)u)|(?:(?&lt;=f)ang|(?&lt;=f)eng|(?&lt;=f)iao|(?&lt;=f)an|(?&lt;=f)ei|(?&lt;=f)en|(?&lt;=f)ou|(?&lt;=f)a|(?&lt;=f)o|(?&lt;=f)u)|(?:(?&lt;!sh|ch|zh)(?&lt;=h)uang|(?&lt;!sh|ch|zh)(?&lt;=h)ang|(?&lt;!sh|ch|zh)(?&lt;=h)eng|(?&lt;!sh|ch|zh)(?&lt;=h)ong|(?&lt;!sh|ch|zh)(?&lt;=h)uai|(?&lt;!sh|ch|zh)(?&lt;=h)uan|(?&lt;!sh|ch|zh)(?&lt;=h)ai|(?&lt;!sh|ch|zh)(?&lt;=h)an|(?&lt;!sh|ch|zh)(?&lt;=h)ao|(?&lt;!sh|ch|zh)(?&lt;=h)ei|(?&lt;!sh|ch|zh)(?&lt;=h)en|(?&lt;!sh|ch|zh)(?&lt;=h)ou|(?&lt;!sh|ch|zh)(?&lt;=h)ua|(?&lt;!sh|ch|zh)(?&lt;=h)ui|(?&lt;!sh|ch|zh)(?&lt;=h)un|(?&lt;!sh|ch|zh)(?&lt;=h)uo|(?&lt;!sh|ch|zh)(?&lt;=h)a|(?&lt;!sh|ch|zh)(?&lt;=h)e|(?&lt;!sh|ch|zh)(?&lt;=h)u)|(?:(?&lt;=k)uang|(?&lt;=k)ang|(?&lt;=k)eng|(?&lt;=k)ong|(?&lt;=k)uai|(?&lt;=k)uan|(?&lt;=k)ai|(?&lt;=k)an|(?&lt;=k)ao|(?&lt;=k)en|(?&lt;=k)ou|(?&lt;=k)ua|(?&lt;=k)ui|(?&lt;=k)un|(?&lt;=k)uo|(?&lt;=k)a|(?&lt;=k)e|(?&lt;=k)u)|(?:(?&lt;=j)iang|(?&lt;=j)iong|(?&lt;=j)ian|(?&lt;=j)iao|(?&lt;=j)ing|(?&lt;=j)üan|(?&lt;=j)ia|(?&lt;=j)ie|(?&lt;=j)in|(?&lt;=j)iu|(?&lt;=j)üe|(?&lt;=j)ün|(?&lt;=j)i|(?&lt;=j)ü)|(?:(?&lt;=m)ang|(?&lt;=m)eng|(?&lt;=m)ian|(?&lt;=m)iao|(?&lt;=m)ing|(?&lt;=m)ai|(?&lt;=m)an|(?&lt;=m)ao|(?&lt;=m)ei|(?&lt;=m)en|(?&lt;=m)ie|(?&lt;=m)in|(?&lt;=m)iu|(?&lt;=m)ou|(?&lt;=m)a|(?&lt;=m)e|(?&lt;=m)i|(?&lt;=m)o|(?&lt;=m)u)|(?:(?&lt;=l)iang|(?&lt;=l)ang|(?&lt;=l)eng|(?&lt;=l)ian|(?&lt;=l)iao|(?&lt;=l)ing|(?&lt;=l)ong|(?&lt;=l)uan|(?&lt;=l)ai|(?&lt;=l)an|(?&lt;=l)ao|(?&lt;=l)ei|(?&lt;=l)ia|(?&lt;=l)ie|(?&lt;=l)in|(?&lt;=l)iu|(?&lt;=l)ou|(?&lt;=l)un|(?&lt;=l)uo|(?&lt;=l)üe|(?&lt;=l)a|(?&lt;=l)e|(?&lt;=l)i|(?&lt;=l)o|(?&lt;=l)u|(?&lt;=l)ü)|(?:(?&lt;=n)iang|(?&lt;=n)ang|(?&lt;=n)eng|(?&lt;=n)ian|(?&lt;=n)iao|(?&lt;=n)ing|(?&lt;=n)ong|(?&lt;=n)uan|(?&lt;=n)ai|(?&lt;=n)an|(?&lt;=n)ao|(?&lt;=n)ei|(?&lt;=n)en|(?&lt;=n)ie|(?&lt;=n)in|(?&lt;=n)iu|(?&lt;=n)ou|(?&lt;=n)un|(?&lt;=n)uo|(?&lt;=n)üe|(?&lt;=n)a|(?&lt;=n)e|(?&lt;=n)i|(?&lt;=n)u|(?&lt;=n)ü)|(?:(?&lt;=q)iang|(?&lt;=q)iong|(?&lt;=q)ian|(?&lt;=q)iao|(?&lt;=q)ing|(?&lt;=q)üan|(?&lt;=q)ia|(?&lt;=q)ie|(?&lt;=q)in|(?&lt;=q)iu|(?&lt;=q)üe|(?&lt;=q)ün|(?&lt;=q)i|(?&lt;=q)ü)|(?:(?&lt;=p)ang|(?&lt;=p)eng|(?&lt;=p)ian|(?&lt;=p)iao|(?&lt;=p)ing|(?&lt;=p)ai|(?&lt;=p)an|(?&lt;=p)ao|(?&lt;=p)ei|(?&lt;=p)en|(?&lt;=p)ie|(?&lt;=p)in|(?&lt;=p)ou|(?&lt;=p)a|(?&lt;=p)i|(?&lt;=p)o|(?&lt;=p)u)|(?:(?&lt;=s)ang|(?&lt;=s)eng|(?&lt;=s)ong|(?&lt;=s)uan|(?&lt;=s)ai|(?&lt;=s)an|(?&lt;=s)ao|(?&lt;=s)en|(?&lt;=s)ou|(?&lt;=s)ui|(?&lt;=s)un|(?&lt;=s)uo|(?&lt;=s)a|(?&lt;=s)e|(?&lt;=s)i|(?&lt;=s)u)|(?:(?&lt;=r)ang|(?&lt;=r)eng|(?&lt;=r)ong|(?&lt;=r)uan|(?&lt;=r)an|(?&lt;=r)ao|(?&lt;=r)en|(?&lt;=r)ou|(?&lt;=r)ua|(?&lt;=r)ui|(?&lt;=r)un|(?&lt;=r)uo|(?&lt;=r)e|(?&lt;=r)i|(?&lt;=r)u)|(?:(?&lt;=t)ang|(?&lt;=t)eng|(?&lt;=t)ian|(?&lt;=t)iao|(?&lt;=t)ing|(?&lt;=t)ong|(?&lt;=t)uan|(?&lt;=t)ai|(?&lt;=t)an|(?&lt;=t)ao|(?&lt;=t)ei|(?&lt;=t)ie|(?&lt;=t)ou|(?&lt;=t)ui|(?&lt;=t)un|(?&lt;=t)uo|(?&lt;=t)a|(?&lt;=t)e|(?&lt;=t)i|(?&lt;=t)u)|(?:(?&lt;=w)ang|(?&lt;=w)eng|(?&lt;=w)ai|(?&lt;=w)an|(?&lt;=w)ei|(?&lt;=w)en|(?&lt;=w)a|(?&lt;=w)o|(?&lt;=w)u)|(?:(?&lt;=y)ang|(?&lt;=y)ing|(?&lt;=y)ong|(?&lt;=y)uan|(?&lt;=y)ai|(?&lt;=y)an|(?&lt;=y)ao|(?&lt;=y)in|(?&lt;=y)ou|(?&lt;=y)ue|(?&lt;=y)un|(?&lt;=y)a|(?&lt;=y)e|(?&lt;=y)e|(?&lt;=y)i|(?&lt;=y)o|(?&lt;=y)u)|(?:(?&lt;=x)iang|(?&lt;=x)iong|(?&lt;=x)ian|(?&lt;=x)iao|(?&lt;=x)ing|(?&lt;=x)üan|(?&lt;=x)ia|(?&lt;=x)ie|(?&lt;=x)in|(?&lt;=x)iu|(?&lt;=x)üe|(?&lt;=x)ün|(?&lt;=x)i|(?&lt;=x)ü)|(?:(?&lt;=z)ang|(?&lt;=z)eng|(?&lt;=z)ong|(?&lt;=z)uan|(?&lt;=z)ai|(?&lt;=z)an|(?&lt;=z)ao|(?&lt;=z)ei|(?&lt;=z)en|(?&lt;=z)ou|(?&lt;=z)ui|(?&lt;=z)un|(?&lt;=z)uo|(?&lt;=z)a|(?&lt;=z)e|(?&lt;=z)i|(?&lt;=z)u)|(?:(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)a|(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)ai|(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)an|(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)ang|(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)ao|(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)e|(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)ei|(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)en|(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)eng|(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)er|(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)o|(?&lt;!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)ou))$ </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload