Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Here is an UTF-8 safe solution, which not only works with properly formatted documents, but also with document fragments.</p> <p>The mb_convert_encoding is needed, because loadHtml() seems to has a bug with UTF-8 encoding (see <a href="https://stackoverflow.com/questions/3872423/php-problem-with-russian-language/3872663#3872663">here</a> and <a href="https://stackoverflow.com/questions/2236889/why-does-dom-change-encoding/2238149#2238149">here</a>).</p> <p>The mb_substr is trimming the body tag from the output, this way you get back your original content without any additional markup.</p> <pre><code>&lt;?php $html = '&lt;p&gt;Match this text and replace it&lt;/p&gt; &lt;p&gt;Don\'t &lt;a href="/"&gt;match this text&lt;/a&gt;&lt;/p&gt; &lt;p&gt;We still need to match this text and replace itŐŰ&lt;/p&gt; &lt;p&gt;This is &lt;a href="#"&gt;a link &lt;span&gt;with &lt;strong&gt;don\'t match this text&lt;/strong&gt; content&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;'; $dom = new DOMDocument(); // loadXml needs properly formatted documents, so it's better to use loadHtml, but it needs a hack to properly handle UTF-8 encoding $dom-&gt;loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8")); $xpath = new DOMXPath($dom); foreach($xpath-&gt;query('//text()[not(ancestor::a)]') as $node) { $replaced = str_ireplace('match this text', 'MATCH', $node-&gt;wholeText); $newNode = $dom-&gt;createDocumentFragment(); $newNode-&gt;appendXML($replaced); $node-&gt;parentNode-&gt;replaceChild($newNode, $node); } // get only the body tag with its contents, then trim the body tag itself to get only the original content echo mb_substr($dom-&gt;saveXML($xpath-&gt;query('//body')-&gt;item(0)), 6, -7, "UTF-8"); </code></pre> <p>References:<br> <a href="https://stackoverflow.com/questions/3151064/find-and-replace-keywords-by-hyperlinks-in-an-html-fragment-via-php-dom/3151554#3151554">1. find and replace keywords by hyperlinks in an html fragment, via php dom</a><br> <a href="https://stackoverflow.com/questions/4044812/regex-domdocument-match-and-replace-text-not-in-a-link/4156573#4156573">2. Regex / DOMDocument - match and replace text not in a link</a><br> <a href="https://stackoverflow.com/questions/3872423/php-problem-with-russian-language/3872663#3872663">3. php problem with russian language</a><br> <a href="https://stackoverflow.com/questions/2236889/why-does-dom-change-encoding/2238149#2238149">4. Why Does DOM Change Encoding?</a></p> <p><em>I read dozens of answers in the subject, so I am sorry if I forgot somebody (please comment it and I will add yours as well in this case).</em></p> <p>Thanks for Gordon and stillstanding for commenting on <a href="https://stackoverflow.com/questions/4044812/regex-domdocument-match-and-replace-text-not-in-a-link/4192155#4192155">my other answer</a>.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload