Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I think you need to traverse the tree. The result of <strong>text()</strong> on an Element will be all of the Element's text including text within child elements. Hopefully something like the following code will be helpful to you:</p> <pre><code>import java.io.File; import java.io.IOException; import java.util.StringTokenizer; import org.apache.commons.io.FileUtils; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.nodes.Node; import org.jsoup.nodes.TextNode; public class ScreenScrape { public static void main(String[] args) throws IOException { String content = FileUtils.readFileToString(new File("test.html")); Document doc = Jsoup.parse(content); Element body = doc.body(); //System.out.println(body.toString()); StringBuilder sb = new StringBuilder(); traverse(body, sb); System.out.println(sb.toString()); } private static void traverse(Node n, StringBuilder sb) { if (n instanceof Element) { sb.append('&lt;'); sb.append(n.nodeName()); if (n.attributes().size() &gt; 0) { sb.append(n.attributes().toString()); } sb.append('&gt;'); } if (n instanceof TextNode) { TextNode tn = (TextNode) n; if (!tn.isBlank()) { sb.append(spanifyText(tn.text())); } } for (Node c : n.childNodes()) { traverse(c, sb); } if (n instanceof Element) { sb.append("&lt;/"); sb.append(n.nodeName()); sb.append('&gt;'); } } private static String spanifyText(String text){ StringBuilder sb = new StringBuilder(); StringTokenizer st = new StringTokenizer(text); String token; while (st.hasMoreTokens()) { token = st.nextToken(); if(token.length() &gt; 3){ sb.append("&lt;span&gt;"); sb.append(token); sb.append("&lt;/span&gt;"); } else { sb.append(token); } sb.append(' '); } return sb.substring(0, sb.length() - 1).toString(); } } </code></pre> <hr> <p><strong>UPDATE</strong></p> <p>Using Jonathan's new Jsoup <strong>List element.textNode()</strong> method and combining it with MarcoS's suggested NodeTraversor/NodeVisitor technique I came up with (although I am modifying the tree whilst traversing it - probably a bad idea):</p> <pre><code>Document doc = Jsoup.parse(content); Element body = doc.body(); NodeTraversor nd = new NodeTraversor(new NodeVisitor() { @Override public void tail(Node node, int depth) { if (node instanceof Element) { boolean foundLongWord; Element elem = (Element) node; Element span; String token; StringTokenizer st; ArrayList&lt;Node&gt; changedNodes; Node currentNode; for (TextNode tn : elem.textNodes()) { foundLongWord = Boolean.FALSE; changedNodes = new ArrayList&lt;Node&gt;(); st = new StringTokenizer(tn.text()); while (st.hasMoreTokens()) { token = st.nextToken(); if (token.length() &gt; 3) { foundLongWord = Boolean.TRUE; span = new Element(Tag.valueOf("span"), elem.baseUri()); span.appendText(token); changedNodes.add(span); } else { changedNodes.add(new TextNode(token + " ", elem.baseUri())); } } if (foundLongWord) { currentNode = changedNodes.remove(0); tn.replaceWith(currentNode); for (Node n : changedNodes) { currentNode.after(n); currentNode = n; } } } } } @Override public void head(Node node, int depth) { } }); nd.traverse(body); System.out.println(body.toString()); </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload