Note that there are some explanatory texts on larger screens.

plurals
  1. POretrieving the text of an element in jsoup
    primarykey
    data
    text
    <p>When I was using jsoup to parse some html files like "google.com" I encountered with a problem in retreiving the text of an element.</p> <p>For example in this <code>div</code> element using the <code>text</code> function, the words "Programs" and "Business" are attached to each other which I think it's not right:</p> <pre><code>&lt;div id="fll" style="margin:19px auto;text-align:center"&gt; &lt;a href="/intl/en/ads/"&gt;Advertising&amp;nbsp;Programs&lt;/a&gt; &lt;a href="/services/"&gt;Business Solutions&lt;/a&gt; &lt;a href="https://plus.google.com/" rel="publisher"&gt;+Google&lt;/a&gt; &lt;a href="/intl/en/about.html"&gt;About Google&lt;/a&gt; &lt;/div&gt; </code></pre> <p>You can test my claim with this code:</p> <pre><code>URL url = new URL("http://www.google.com"); Document document = Jsoup.parse(url, 10000); Element element = document.select("div[id=fll]").first(); System.out.println(element.text()); </code></pre> <p>Output will be:</p> <pre><code>Advertising ProgramsBusiness Solutions+GoogleAbout Google </code></pre> <p>I want to know that can anything to be done about it?</p> <p>By the way I traced the code and found out that the problem will be corrected by adding this line:</p> <pre><code>textNode.text(textNode.text() + " "); </code></pre> <p>between the lines 755 and 756 of the <code>Element</code> class of the <code>nodes</code> package of the <code>jsoup</code> source code.</p> <p>Also this problem exists in <code>Elements</code> class of the <code>select</code> package and probably in other <code>text</code> functions!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload