Note that there are some explanatory texts on larger screens.

plurals
  1. POJava HttpClient seems to be caching content
    primarykey
    data
    text
    <p>I'm building a simple web-scraper and i need to fetch the same page a few hundred times, and there's an attribute in the page that is dynamic and should change at each request. I've built a multithreaded HttpClient based class to process the requests and i'm using an <code>ExecutorService</code> to make a thread pool and run the threads. The problem is that dynamic attribute sometimes doesn't change on each request and i end up getting the same value on like 3 or 4 subsequent threads. I've read alot about HttpClient and i really can't find where this problem comes from. Could it be something about caching, or something like it!?</p> <p>Update: here is the code executed in each thread:</p> <pre><code>HttpContext localContext = new BasicHttpContext(); HttpParams params = new BasicHttpParams(); HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1); HttpProtocolParams.setContentCharset(params, HTTP.DEFAULT_CONTENT_CHARSET); HttpProtocolParams.setUseExpectContinue(params, true); ClientConnectionManager connman = new ThreadSafeClientConnManager(); DefaultHttpClient httpclient = new DefaultHttpClient(connman, params); HttpHost proxy = new HttpHost(inc_proxy, Integer.valueOf(inc_port)); httpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, proxy); HttpGet httpGet = new HttpGet(url); httpGet.setHeader("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"); String iden = null; int timeoutConnection = 10000; HttpConnectionParams.setConnectionTimeout(httpGet.getParams(), timeoutConnection); try { HttpResponse response = httpclient.execute(httpGet, localContext); HttpEntity entity = response.getEntity(); if (entity != null) { InputStream instream = entity.getContent(); String result = convertStreamToString(instream); // System.out.printf("Resultado\n %s",result +"\n"); instream.close(); iden = StringUtils .substringBetween(result, "&lt;input name=\"iden\" value=\"", "\" type=\"hidden\"/&gt;"); System.out.printf("IDEN:%s\n", iden); EntityUtils.consume(entity); } } catch (ClientProtocolException e) { // TODO Auto-generated catch block System.out.println("Excepção CP"); } catch (IOException e) { // TODO Auto-generated catch block System.out.println("Excepção IO"); } </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload