Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Since it isn't clear exactly what you are trying to accomplish the solution below breaks an HTML document into individual tags and lines of text. There are probably a few corner cases that are not handled but it does handle attribute strings in case they contain the end tag delimiter. It was written quickly and not much testing has been done so I will leave any necessary fixes up to you. It's not pretty but works and should be enough to get you started.</p> <pre><code>#include &lt;vector&gt; #include &lt;string&gt; #include &lt;iostream&gt; int main() { std::string html("&lt;div style=\"width: 200px;\"&gt;&lt;strong&gt;Balance Sheets (USD $)&lt;br&gt;&lt;/strong&gt;&lt;/div&gt;"); std::vector&lt;std::string&gt; tags; std::vector&lt;std::string&gt; text; for(;;) { std::string::size_type startpos; startpos = html.find('&lt;'); if(startpos == std::string::npos) { // no tags left only text! text.push_back(html); break; } // handle the text before the tag if(0 != startpos) { text.push_back(html.substr(0, startpos)); html = html.substr(startpos, html.size() - startpos); startpos = 0; } // skip all the text in the html tag std::string::size_type endpos; for(endpos = startpos; endpos &lt; html.size() &amp;&amp; html[endpos] != '&gt;'; ++endpos) { // since '&gt;' can appear inside of an attribute string we need // to make sure we process it properly. if(html[endpos] == '"') { endpos++; while(endpos &lt; html.size() &amp;&amp; html[endpos] != '"') { endpos++; } } } // Handle text and end of html that has beginning of tag but not the end if(endpos == html.size()) { html = html.substr(endpos, html.size() - endpos); break; } else { // handle the entire tag endpos++; tags.push_back(html.substr(startpos, endpos - startpos)); html = html.substr(endpos, html.size() - endpos); } } std::cout &lt;&lt; "tags:\n-----------------" &lt;&lt; std::endl; // auto, iterators or range based for loop would probably be better but // this makes it a bit easier to read. for(size_t i = 0; i &lt; tags.size(); i++) { std::cout &lt;&lt; tags[i] &lt;&lt; std::endl; } std::cout &lt;&lt; "\ntext:\n-----------------" &lt;&lt; std::endl; for(size_t i = 0; i &lt; text.size(); i++) { std::cout &lt;&lt; text[i] &lt;&lt; std::endl; } } </code></pre> <p>The above code generates the following output (without the space after &lt; since the SO markdown interprets it a an HTML tag like it should)</p> <blockquote> <h2>tags:</h2> <p>&lt; div style="width: 200px;"><br> &lt; strong><br> &lt; br><br> &lt; /strong><br> &lt; /div> </p> <h2>text:</h2> <p>Balance Sheets (USD $) </p> </blockquote>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload