Note that there are some explanatory texts on larger screens.

plurals
  1. POPython Regular Expression: BackReference
    primarykey
    data
    text
    <p>Here is the Python 2.5 code (which replace the word <code>fox</code> with a link<code>&lt;a href="/fox"&gt;fox&lt;/a&gt;</code>, and it avoided the replacement inside a link):</p> <pre><code>import re content=""" &lt;div&gt; &lt;p&gt;The quick brown &lt;a href='http://en.wikipedia.org/wiki/Fox'&gt;fox&lt;/a&gt; jumped over the lazy Dog&lt;/p&gt; &lt;p&gt;The &lt;a href='http://en.wikipedia.org/wiki/Dog'&gt;dog&lt;/a&gt;, who was, in reality, not so lazy, gave chase to the fox.&lt;/p&gt; &lt;p&gt;See &amp;quot;Dog chase Fox&amp;quot; image for reference:&lt;/p&gt; &lt;img src='dog_chasing_fox.jpg' title='Dog chasing fox'/&gt; &lt;/div&gt; """ p=re.compile(r'(?!((&lt;.*?)|(&lt;a.*?)))(fox)(?!(([^&lt;&gt;]*?)&gt;)|([^&gt;]*?&lt;/a&gt;))',re.IGNORECASE|re.MULTILINE) print p.findall(content) for match in p.finditer(content): print match.groups() output=p.sub(r'&lt;a href="/fox"&gt;\3&lt;/a&gt;',content) print output </code></pre> <p>The output is:</p> <pre><code>[('', '', '', 'fox', '', '.', ''), ('', '', '', 'Fox', '', '', '')] ('', '', None, 'fox', '', '.', '') ('', '', None, 'Fox', None, None, None) Traceback (most recent call last): File "C:/example.py", line 18, in &lt;module&gt; output=p.sub(r'&lt;a href="fox"&gt;\3&lt;/a&gt;',content) File "C:\Python25\lib\re.py", line 274, in filter return sre_parse.expand_template(template, match) File "C:\Python25\lib\sre_parse.py", line 793, in expand_template raise error, "unmatched group" error: unmatched group </code></pre> <ol> <li><p>I am not sure why the backreference <code>\3</code> wont work.</p></li> <li><p><code>(?!((&lt;.*?)|(&lt;a.*?)))(fox)(?!(([^&lt;&gt;]*?)&gt;)|([^&gt;]*?&lt;/a&gt;))</code> works see <a href="http://regexr.com?317bn" rel="nofollow">http://regexr.com?317bn</a> , which is surprising. The first negative lookahead <code>(?!((&lt;.*?)|(&lt;a.*?)))</code> puzzles me. In my opinion, it is not supposed to work. Take the first match it finds, <code>fox</code> in <code>gave chase to the fox.&lt;/p&gt;</code>, there is a <code>&lt;a href='http://en.wikipedia.org/wiki/Dog'&gt;dog&lt;/a&gt;</code> where matches <code>((&lt;.*?)|(&lt;a.*?))</code>, and as a negative lookahead, it should return a FALSE. I am not sure I express myself clearly or not.</p></li> </ol> <p>Thanks a lot!</p> <p>(Note: I hate using BeautifulSoup. I enjoy writing my own regular expression. I know many people here will say Regular expression is not for HTML processing blah blah. But this is a small program, so I prefer Regular expression over BeautifulSoup)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload