Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Yes, there is. It's a bit complicated, but works well for us.</p> <p>Basically, we change the Filter chain for sessions. We do this for bots (Google, Pear, Yahoo).</p> <p>Create a new Filter and register it, then use this source for the Filter class:</p> <pre><code>public class BotFilter implements javax.servlet.Filter { private int inactive_seconds = 5*60; private String[] bots = new String[] { "googlebot", //google "msnbot", //msn "slurp", //yahoo "libcurl", //curl, sometimes used with bigbrother "bigbrother", //bigbrother availability check "whatsup", //whatsup availability check "surveybot", //unknown "wget", // nocomment "speedyspider", //http://www.entireweb.com/about/search_tech/speedyspider/ "nagios-plugins", //Alle Nagios-Abfragen "pear.php.net", //Irgendwelcher PHP-Scheiß "mj12bot", //http://www.majestic12.co.uk/projects/dsearch/mj12bot.php "bingbot", //M$ Bing "dotbot", //We are just a few Seattle based guys trying to figure out how to make internet data as open as possible. "aggregator:spinn3r", //http://spinn3r.com/robot "baiduspider" //http://www.baidu.com/search/spider.htm }; private HashMap&lt;String, HttpSession&gt; botsessions; public BotFilter() { this.botsessions = new HashMap&lt;String, HttpSession&gt;(); } public void init(FilterConfig config) throws ServletException { } public void doFilter(ServletRequest request, ServletResponse response, FilterChain next) throws IOException, ServletException { if (request instanceof HttpServletRequest) { HttpServletRequest httprequest = (HttpServletRequest) request; try { String useragent = ((HttpServletRequest) request).getHeader("User-Agent"); if (useragent == null) { ((HttpServletResponse) response).sendRedirect("http://www.google.com"); } useragent = useragent.toLowerCase(); if (httprequest.getSession(false) == null) { } for (int i = 0; i &lt; this.bots.length; i++) { if (useragent.indexOf(this.bots[i]) &gt; -1) { String key = httprequest.getRemoteAddr() + useragent; boolean SessionIsInvalid=false; synchronized(this.botsessions) { try { if(this.botsessions.get(key)!=null) this.botsessions.get(key).getAttributeNames(); } catch (java.lang.IllegalStateException ise) { SessionIsInvalid = true; } if(this.botsessions.get(key)==null||SessionIsInvalid) { httprequest.getSession().setMaxInactiveInterval(this.inactive_seconds); if(SessionIsInvalid) this.botsessions.remove(key); //Remove first, if in there this.botsessions.put(key, httprequest.getSession()); //Then add a little spice } else { next.doFilter(new BotFucker(httprequest, this.botsessions.get(key)), response); return; } } }; } } catch (Exception e) { //Error handling code } } next.doFilter(request, response); } public void destroy() { } } </code></pre> <p>And this little one for the redirection class:</p> <pre><code>public class BotFucker extends HttpServletRequestWrapper { HttpSession session; public BotFucker(HttpServletRequest request, HttpSession session) { super(request); this.session = session; } @Override public HttpSession getSession(boolean create) { return this.session; } @Override public HttpSession getSession() { return this.session; } } </code></pre> <p>These two classes re-use the sessions that the bots had before, if they connect again using the same IP within a given time limit. We're not 100% sure what this does to the data that the bot receives, but as this code is running for many months now and solved our problem (multiple connects/sessions per second per IP from Google).</p> <p>And before somebody tries to help: The problem has been submitted multiple times to Google via Webmaster interface. The crawling interval has been lowered to the lowest possible setting, and the problem spawned a 3x reply thread on the appropriate forum without any hint as to why this problem exists.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload