Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>This might be the solution you are lookin for:</p> <pre><code>from HTMLParser import HTMLParser class MyParser(HTMLParser): def __init__(self,link, keyword): HTMLParser.__init__(self) self.__html = [] self.link = link self.keyword = keyword def handle_data(self, data): text = data.strip() self.__html.append(text.replace(self.keyword,'&lt;a href="'+self.link+'&gt;'+self.keyword+'&lt;/a&gt;')) def handle_starttag(self, tag, attrs): self.__html.append("&lt;"+tag+"&gt;") def handle_endtag(self, tag): self.__html.append("&lt;/"+tag+"&gt;") def new_html(self): return ''.join(self.__html).strip() parser = MyParser("blah","keyword") parser.feed("&lt;div&gt;&lt;p&gt;Text with keyword here&lt;/p&gt;&lt;/div&gt;") parser.close() print parser.new_html() </code></pre> <p>This will give you the following output</p> <pre><code>&lt;div&gt;&lt;p&gt;Text with &lt;a href="blah&gt;keyword&lt;/a&gt; here&lt;/p&gt;&lt;/div&gt; </code></pre> <p>The problem with your lxml approach only seems to occur when the keywords has only a single nesting. It seems to work fine with multiple nestings. So I added an if condition to catch this exception.</p> <pre><code>from lxml.html import fragments_fromstring, fromstring, tostring from re import compile def markup_aware_sub(pattern, repl, text): exp = compile(pattern) root = fromstring(text) els = [el for el in root.getiterator() if el.text] els = [el for el in els if el.text.strip()] if len(els) == 1: el = els[0] text = exp.sub(repl, el.text) parent = el.getparent() new_el = fromstring(text) new_el.tag = el.tag for k, v in el.attrib.items(): new_el.attrib[k] = v return tostring(new_el) for el in els: text = exp.sub(repl, el.text) if text == el.text: continue parent = el.getparent() new_el = fromstring(text) new_el.tag = el.tag for k, v in el.attrib.items(): new_el.attrib[k] = v parent.replace(el, new_el) return tostring(root) print markup_aware_sub('keyword', '&lt;a&gt;blah&lt;/a&gt;', '&lt;p&gt;Text with keyword here&lt;/p&gt;') </code></pre> <p>Not very elegant, but seems to work. Please check it out.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload