Note that there are some explanatory texts on larger screens.

plurals
  1. POReading directly from a URL in Java
    primarykey
    data
    text
    <p>When I print the contents of <a href="http://www.amazon.com/s/ref=sr_pg_3?rh=n%3A172282&amp;page=1" rel="nofollow">http://www.amazon.com/s/ref=sr_pg_3?rh=n%3A172282&amp;page=1</a>, I see different HTML than what's displayed when utilizing the "View Source" feature in my browser (Chrome, in my case, though I don't think the exact browser matters). For example, the div with id "result_10" from the aforementioned URL appears like this in one's browser:</p> <pre><code>&lt;div id="result_10" class="rsltGrid prod" name="B007I5JT4S"&gt; </code></pre> <p>But when printing the same web page contents with Java's <code>java.net.URL</code> utility, the same div appears like this:</p> <pre><code>&lt;div class="result product" id="result_10" name="B007I5JT4S"&gt; </code></pre> <p>This is just one of the many differences in identifiers and page structure between the HTML produced by programmatically reading this page and using a browser. I'm not sure if this stems from some sort of URL resolution issue or something entirely different.</p> <p>How can I acquire the same page content I see in my browser from a Java app?</p> <p>Here's the function I've been using to read URLs, with "http://www.amazon.com/s/ref=sr_pg_3?rh=n%3A172282&amp;page=1" being the argument in question.</p> <pre><code>public static void printWebPageContents(String url) throws IOException { URL specifiedUrl = new URL(url); BufferedReader in = new BufferedReader(new InputStreamReader(specifiedUrl.openStream())); String inputLine; while ((inputLine = in.readLine()) != null) System.out.println(inputLine); in.close(); } </code></pre> <p>Don't hesitate to let me know if any clarification is needed.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload