Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Parsing HTML with regex is not ideal. Others have suggested the HTML Agility Pack. However, if you can guarantee that your input is well-defined and you always know what to expect then using a regex is possible.</p> <p>If you can make that guarantee, read on. Otherwise you need to consider the other suggestions or define your input better. In fact, you should define your input better regardless because my answer makes a few assumptions. Some questions to consider:</p> <ul> <li>Will the HTML be on one line or multiple lines, separated by newline characters?</li> <li>Will the HTML always be in the form of <code>&lt;div&gt;...&lt;h2...&gt;...&lt;/h2&gt;&lt;h3...&gt;...&lt;/h3&gt;&lt;/div&gt;</code>? Or can there be <code>h1-h6</code> tags?</li> <li>On top of the <code>hN</code> tags, will the date and number always be between the tags with <code>id-date</code> and <code>nr</code> values for the <code>id</code> attribute?</li> </ul> <p>Depending on the answers to these questions the pattern can change. The following code assumes each HTML fragment follows the structure you shared, that it will have an <code>h2</code> and <code>h3</code> with date and number, respectively, and that each tag will be on a new line. If you feed it different input it will likely break till the pattern matches your input's structure.</p> <pre><code>Dim input As String = "&lt;div id=""div""&gt;" &amp; Environment.Newline &amp; _ "&lt;h2 id=""id-date""&gt;09.09.2010&lt;/h2&gt;" &amp; Environment.Newline &amp; _ "&lt;h3 id=""nr""&gt;000&lt;/h3&gt;" &amp; Environment.Newline &amp; _ "&lt;/div&gt;" Dim pattern As String = "&lt;div[^&gt;]+&gt;.*?" &amp; _ "&lt;h2\sid=""id-date""&gt;(?&lt;Date&gt;\d{2}\.\d{2}\.\d{4})&lt;/h2&gt;.*?" &amp; _ "&lt;h3\sid=""nr""&gt;(?&lt;Number&gt;\d+)&lt;/h3&gt;.*?&lt;/div&gt;" Dim m As Match = Regex.Match(input, pattern, RegexOptions.Singleline) If m.Success Then Dim actualDate As DateTime = DateTime.Parse(m.Groups("Date").Value) Dim actualNumber As Integer = Int32.Parse(m.Groups("Number").Value) Console.WriteLine("Parsed Date: " &amp; m.Groups("Date").Value) Console.WriteLine("Actual Date: " &amp; actualDate) Console.WriteLine("Parsed Number: " &amp; m.Groups("Number").Value) Console.WriteLine("Actual Number: " &amp; actualNumber) Else Console.WriteLine("No match!") End If </code></pre> <p>The pattern can be on one line but I broke it up for clarity. <code>RegexOptions.Singleline</code> is used to allow the <code>.</code> metacharacter to handle <code>\n</code> for newlines.</p> <p>You also said: </p> <blockquote> <p>Also and this will be in loop, meaning there are more div block needed to be parsed.</p> </blockquote> <p>Are you looping over separate strings? Or are you expecting multiple occurrences of the above HTML structure in a single string? If the former, the above code should be applied to each string. For the latter you'll want to use <a href="http://msdn.microsoft.com/en-us/library/b49yw9s8.aspx" rel="nofollow noreferrer"><code>Regex.Matches</code></a> and treat each <code>Match</code> result similarly to the above piece of code.</p> <hr> <p><strong>EDIT:</strong> here is some sample code to demonstrate parsing multiple occurrences.</p> <pre><code>Dim input As String = "&lt;div id=""div""&gt;" &amp; Environment.Newline &amp; _ "&lt;h2 id=""id-date""&gt;09.09.2010&lt;/h2&gt;" &amp; Environment.Newline &amp; _ "&lt;h3 id=""nr""&gt;000&lt;/h3&gt;" &amp; Environment.Newline &amp; _ "&lt;/div&gt;" &amp; _ "&lt;div id=""div""&gt;" &amp; Environment.Newline &amp; _ "&lt;h2 id=""id-date""&gt;09.14.2010&lt;/h2&gt;" &amp; Environment.Newline &amp; _ "&lt;h3 id=""nr""&gt;123&lt;/h3&gt;" &amp; Environment.Newline &amp; _ "&lt;/div&gt;" Dim pattern As String = "&lt;div[^&gt;]+&gt;.*?" &amp; _ "&lt;h2\sid=""id-date""&gt;(?&lt;Date&gt;\d{2}\.\d{2}\.\d{4})&lt;/h2&gt;.*?" &amp; _ "&lt;h3\sid=""nr""&gt;(?&lt;Number&gt;\d+)&lt;/h3&gt;.*?&lt;/div&gt;" For Each m As Match In Regex.Matches(input, pattern, RegexOptions.Singleline) Dim actualDate As DateTime = DateTime.Parse(m.Groups("Date").Value) Dim actualNumber As Integer = Int32.Parse(m.Groups("Number").Value) Console.WriteLine("Parsed Date: " &amp; m.Groups("Date").Value) Console.WriteLine("Actual Date: " &amp; actualDate) Console.WriteLine("Parsed Number: " &amp; m.Groups("Number").Value) Console.WriteLine("Actual Number: " &amp; actualNumber) Next </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload