Note that there are some explanatory texts on larger screens.

plurals
  1. POJava/JSoup Plaintext Extraction and Storage
    primarykey
    data
    text
    <p>I am trying to solve the following problem.</p> <p>Assume I have a HTML file that reads:</p> <hr> <pre><code>&lt;/div class = nameCouldBeAnything1&gt;&lt;br&gt; &lt;p&gt;some text here&lt;/p&gt;&lt;br&gt; &lt;/div&gt; &lt;div class = nameCouldBeAnything2&gt;&lt;br&gt; &lt;p&gt;some more text here&lt;/p&gt;&lt;br&gt; &lt;/div&gt; &lt;div class = nameCouldBeAnything3&gt;&lt;br&gt; &lt;p&gt;even more text here&lt;/p&gt;&lt;br&gt; &lt;p&gt;and here&lt;/p&gt;&lt;br&gt; &lt;p&gt;and here&lt;/p&gt;&lt;br&gt; &lt;p&gt;and here&lt;/p&gt;&lt;br&gt; &lt;p&gt;and here&lt;/p&gt;&lt;br&gt; &lt;/div&gt; </code></pre> <hr> <p>What I am trying to achieve is to store the contents in between the <strong>div</strong> tags into separate string or string array variables.</p> <p>If there is a Jsoup solution this would be great, if there isn't then a regex string matching starting from p and ending at /p would be great also.</p> <p><strong>The challenges to take into consideration are:</strong></p> <p>1) You can not use specific div class names to pinpoint the location of the p tags in order to obtain the plaintext using Jsoup. </p> <p>2) Using <code>doc.select("body p")</code> or <code>doc.select("div p")</code> from Jsoup kind of works, however when you want to store the p tags into string variables they will be written individually into variables instead of by <strong>div</strong> into variables.</p> <p>This is what I have so far:</p> <pre><code>htmlFile = Jsoup.parse(input, "UTF-8"); Elements body = htmlFile.select("body p"); Element bodyStart = body.first(); Element bodyEnd = body.last(); Element p = bodyStart; int divCount = 0; while(p != bodyEnd) { p = body.get(divCount); System.out.println(p.text()); divCount++; } </code></pre> <p>This will get each individual p tag however I want the p tags to stay within their respective divs and store each individual div into string/string array variables.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload