Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<blockquote> <p>what I did was to add another alternative at the end, that selects "any sequence that doesn't contain <code>&lt;</code> or <code>&gt;</code>", which then means the leftover text. I named that last bit in a capture group, and when I iterate over the matches, I check for the presence of text in the "text" group.</p> </blockquote> <p>That's what one would normally do. Or even simpler, replace every match of the markup pattern with and empty string and what you've got left is the stuff you're looking for.</p> <blockquote> <p>It kind of works, but there seems to be a string here and there that gets picked up that shouldn't be.</p> </blockquote> <p>Well yeah, that's because your expression—and regex in general—is inadequate to parse even valid HTML, let alone the horrors that are out there on the real web. First tip to look at, if you really want to chase this futile approach: attribute values (as well as text content in general) may contain an unescaped <code>&gt;</code> character.</p> <p>I would like to once again suggest the benefits of HTML Agility Pack.</p> <p>ETA: since you seem to want it, here's some examples of markup that looks like it'll trip up your expression.</p> <pre><code>&lt;a href=link&gt;&lt;/a&gt; - unquoted &lt;a href= link&gt;&lt;/a&gt; - unquoted, space at front matched but then required at back &lt;a href="~/link"&gt;&lt;/a&gt; - very common URL char missing in group &lt;a href="link$!*'link"&gt;&lt;/a&gt; - more URL chars missing in group &lt;a href=lïnk&gt;&lt;/a&gt; - IRI &lt;a href ="link"&gt; - newline (or tab) &lt;div style="background-image: url(link);"&gt; - unquoted &lt;div style="background-image: url( 'link' );"&gt; - spaced &lt;div style="background-image: u&amp;#114;l('link');"&gt; - html escape &lt;div style="background-image: ur\l('link');"&gt; - css escape &lt;div style="background-image: url('link\')link');"&gt; - css escape &lt;div style="background-image: url(\ 'link')"&gt; - CSS folding &lt;div style="background-image: url ('link')"&gt; - newline (or tab) </code></pre> <p>and that's just completely valid markup that <em>won't</em> match the right link, not any of the possible invalid markup, markup that shouldn't but does match a link, or any of the many problems with your other technique of splitting markup from text. This is the tip of the iceberg.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload