Note that there are some explanatory texts on larger screens.

plurals
  1. PORegex: How do I capture a group after an optional capturing group using regular expressions?
    primarykey
    data
    text
    <p>Suppose I have the following strings:</p> <pre><code>s1=u'--FE(-)---' s2=u'--FEM(-)---' s3=u'--FEE(--)-' </code></pre> <p>and I want to match F,E,E,M and the content of the parentheses in different groups.</p> <p>I have tried the following regular expression:</p> <pre><code>u'^.-([F])([EF]*)([E]+)[^FEM]?(M*)?(\\(.*\\))?.*$' </code></pre> <p>This expressions give the following groups and spans for the different strings:</p> <pre><code>s1 -&gt; 'F',(2,3) , '',(3,3) , 'E',(3,4) , '',(5,5) , None,(-1,-1) s2 -&gt; 'F',(2,3) , '',(3,3) , 'E',(3,4) , 'M',(4,5) , (-),(5,8) s3 -&gt; 'F',(2,3) , 'E',(3,4) , 'E',(4,5) , '',(6,6) , None,(-1,-1) </code></pre> <p>For s2, I get the wanted behaviour, a matching of the contents of the parentheses, but for s1 and s3 I don't. </p> <p>How do I create a regular expression that will match the content of the parentheses even if I don't have a proper match for the group containing 'M's?</p> <p>EDIT:</p> <p>The answer by DWilches resolved the initial issue using the regular expression</p> <pre><code>'^.-(F)([EF]*)(E+)[^FEM]??(M*)(\(.*\)).*?$' </code></pre> <p>However, the parentheses group is also optional. The following short python script clarifies the problem:</p> <pre><code>s1=u'--FE(-)---' s2=u'--FEM(-)--' s3=u'--FEE(--)-' s4=u'--FEE-M(---)--' s5=u'--FE-M-(-)-' s6=u'--FEM--' s7=u'--FE-M--' ll=[s1,s2,s3,s4,s5,s6,s7] import re rr1=re.compile(u'^.-(F)([EF]*)(E+)[^FEM]??(M*)[^FEM]??(\(.*\)).*?$') rr2=re.compile(u'^.-(F)([EF]*)(E+)[^FEM]??(M*)[^FEM]??(\(.*\))?.*?$') for s in ll: b=rr1.search(s) print s if b: print " '%s' '%s' '%s' '%s' '%s' " % (b.group(1), b.group(2), b.group(3), b.group(4), b.group(5)) else: print 'No match' print '######' </code></pre> <p>For <code>rr1</code>, the output is:</p> <pre><code>--FE(-)--- 'F' '' 'E' '' '(-)' ###### --FEM(-)-- 'F' '' 'E' 'M' '(-)' ###### --FEE(--)- 'F' 'E' 'E' '' '(--)' ###### --FEE-M(---)-- 'F' 'E' 'E' 'M' '(---)' ###### --FE-M-(-)- 'F' '' 'E' 'M' '(-)' ###### --FEM-- No match ###### --FE-M-- No match ###### </code></pre> <p>It is OK for the first 5 strings, but not for the two last, since it requires the parentheses.</p> <p>The <code>rr2</code>, however, adding <code>?</code> to <code>(\(.*\))</code>, yields the following output:</p> <pre><code>--FE(-)--- 'F' '' 'E' '' '(-)' ###### --FEM(-)-- 'F' '' 'E' 'M' '(-)' ###### --FEE(--)- 'F' 'E' 'E' '' '(--)' ###### --FEE-M(---)-- 'F' 'E' 'E' '' 'None' ###### --FE-M-(-)- 'F' '' 'E' '' 'None' ###### --FEM-- 'F' '' 'E' 'M' 'None' ###### --FE-M-- 'F' '' 'E' '' 'None' ###### </code></pre> <p>This is ok for <code>s1,s2,s3</code> and <code>s6</code>.</p> <p>Some modification is needed to yield the desired output: getting the <code>M</code> if it exists and the content of the parentheses if the parentheses exist.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload