Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Why so many down-votes? Because YOU wouldn't parse HTML with Regex, he's not allowed? That's very narrow-sighted. <a href="http://suamere.com/Apps/Regex/ParsingHtml.aspx" rel="nofollow">ParsingHTML</a></p> <p>I've seen a large percent of the time that htmlagilitypack can't properly parse a horribly malformed html document, or can't parse concatenated or nested HTML documents from mass-captures. Or that XPath in any form won’t work because an HTML doc is dynamically created, not consistent, and doesn't necessarily contain identifying properties. Why import extra includes and work around sloppy markup when a very simple regex can be more reliable anyway?</p> <p>What if you have a large project where a single method in your project just has to pull out the contents of a DIV of an input HTML document? It isn't an entire HTML parsing project, just a single regex is necessary. Your answer is to include more imports and build a whole new framework for that? I do hundreds of projects a year. Half use DOM/XPath, the other half simply can't, and require Regex.</p> <p>In short, don't be so narrow sighted. Reference XPath/DOM tools but help to answer a question. Don't just down-vote. We aren't Neanderthals who need to consistently laugh about an ancient "Don't Parse HTML with Regex" post made forever ago.</p> <p>The answer(s) follow:</p> <p>First, the simplex one:</p> <pre><code>(?s)&lt;div.*?&gt;(.*?)&lt;/div&gt; </code></pre> <p>Require a particularly named div?</p> <pre><code>(?s)&lt;div[^&gt;]*?class="txt-block"[^&gt;]*?&gt;(.*?)&lt;/div&gt; </code></pre> <p>Want to save CPU and avoid unnecessary backtracking?</p> <pre><code>&lt;div[^&gt;]*?class="txt-block"[^&gt;]*?&gt;(([^&lt;]*(?(?!&lt;/div&gt;)&lt;))*)&lt;/div&gt; </code></pre> <p>The above assumes you don't have nested DIV items. That's when the whole idea of not using Regex really comes into play. Unless you are using C#.Net. In which case you'd just do this:</p> <pre><code>(?xm) (?&gt; &lt;(?&lt;Tagname&gt;div)[^&gt;]*?class="txt-block"[^&gt;]*&gt; ) (?(Tagname) ( &lt;/(?(?!\k'Tagname')(?&lt;-Tagname&gt;))*\k'Tagname'&gt;(?&lt;-Tagname&gt;) | (?&gt; &lt;(?&lt;Tagname&gt;[a-z][^\s&gt;]*)[^&gt;]*&gt; ) | [^&lt;]+ )+? (?(Tagname)(?!)) ) </code></pre> <p>Or, the single line version:</p> <pre><code>(?m)(?&gt;&lt;(?&lt;Tagname&gt;div)[^&gt;]*?class="txt-block"[^&gt;]*&gt;)(?(Tagname)(&lt;/(?(?!\k'Tagname')(?&lt;-Tagname&gt;))*\k'Tagname'&gt;(?&lt;-Tagname&gt;)|(?&gt;&lt;(?&lt;Tagname&gt;[a-z][^\s&gt;]*)[^&gt;]*&gt;)|[^&lt;]+)+?(?(Tagname)(?!))) </code></pre> <p>Pick your poison. Regex is more powerful and reliable than people think. The most complex example I posted won't work in Regex Buddy, but will work in any .Net framework. Regex Buddy doesn't support Balancing Groups, which is a .Net flavor.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload