Note that there are some explanatory texts on larger screens.

plurals
  1. POApache httpclient returning page before it loads?
    primarykey
    data
    text
    <p>I noticed a strange phenomenon when using the apache httpclient libraries and I want to know why it occurs. I created some sample code to demonstrate. Consider the following code:</p> <pre><code>//Example URL String url = "http://rads.stackoverflow.com/amzn/click/05961580"; GetMethod get = new GetMethod(url); HttpMethodRetryHandler httpHandler = new DefaultHttpMethodRetryHandler(1, false); get.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, httpHandler ); get.getParams().setCookiePolicy(CookiePolicy.IGNORE_COOKIES); HttpConnectionManager connectionManager = new SimpleHttpConnectionManager(); HttpClient client = new HttpClient( connectionManager ); client.getParams().setParameter("http.useragent", FIREFOX ); String line; StringBuilder stringBuilder = new StringBuilder(); String toStreamBody = null; String toStringBody = null; try { int statusCode = client.executeMethod(get); if( statusCode != HttpStatus.SC_OK ){ System.err.println("Internet Status: " + HttpStatus.getStatusText(statusCode) ); System.err.println("While getting page: " + url ); } //toString toStringBody = get.getResponseBodyAsString(); //toStream InputStreamReader isr = new InputStreamReader(get.getResponseBodyAsStream()) BufferedReader rd = new BufferedReader(isr); while ((line = rd.readLine()) != null) { stringBuilder.append(line); } } catch (java.io.IOException ex) { System.out.println( "Failed to get page: " + url); } finally { get.releaseConnection(); } toStreamBody = stringBuilder.toString(); </code></pre> <p>This code prints nothing:</p> <pre><code> System.out.println(toStringBody); // "" </code></pre> <p>This code prints the web page:</p> <pre><code> System.out.println(toStreamBody); // "Whole Page" </code></pre> <p>But it gets even stranger... Replace:</p> <pre><code>get.getResponseBodyAsString(); </code></pre> <p>With:</p> <pre><code> get.getResponseBodyAsString(150000); </code></pre> <p>Now we get the error: Failed to get page: <code>http://www.amazon.com/gp/offer-listing/0596158068/ref=dp_olp_used?ie=UTF8</code></p> <p>I was unable to find another website besides for amazon that replicates this behavior but I assume there are others. </p> <p>I am aware that according to the documentation at <code>http://hc.apache.org/httpclient-3.x/performance.html</code> discourages the use of <code>getResponseBodyAsString()</code>, it does not say that the page will not load, only that you may be at risk of an out of memory exception. Is it possible that <code>getResponseBodyAsString()</code> is returning the page before it loads? Why does this only happen with amazon?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload