Note that there are some explanatory texts on larger screens.

plurals
  1. POjsoup times out, xml gets white space error, basic traversing through page is time consuming
    primarykey
    data
    text
    <p>I would like to make a program that parses the html page and selects useful information and displays it. I did it by opening a stream and then line by line searching for this appropriate content, but this is a time consuming process. So then I decided to do it by treating it as a xml and then using xpath. This I did by making a xml file on my system and loading the contents from the stream, and I got white space error, then I decide to direct open document as</p> <pre><code>doc = (Document) builder.parse(inputStream); </code></pre> <p>but the same error still persists. After asking here I was suggested to use jSoup for html parsing, now when I execute my code for:</p> <pre><code>Document doc= Jsoup.connect(url).get(); </code></pre> <p>I get Read timed out. The same program when made in python and using a naive strategy like using find method of string and searching, I am displayed the contents and that too fast. How to make it work fast in java?</p> <p>Complete code:</p> <pre><code>import java.io.*; import org.jsoup.Jsoup; import org.jsoup.helper.Validate; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class Parser { public static void main(String[] args) { Validate.isTrue(true, "usage: supply url to fetch"); try{ String url="http://www.spoj.com/ranks/PRIME1/"; Document doc= Jsoup.connect(url).get(); Elements es=doc.getElementsByAttributeValue("class","lightrow"); System.out.println(es.get(0).child(0).text()); }catch(Exception e){e.printStackTrace();} } </code></pre> <p>}</p> <p>Exception:</p> <pre><code>java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source) at sun.net.www.http.HttpClient.parseHTTP(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source) at java.net.HttpURLConnection.getResponseCode(Unknown Source) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:412) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:393) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:159) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:148) at Parser.main(Parser.java:12) </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload