Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>One thing you didn't list, that are used commonly to detect bad crawlers.</p> <p>Hit speed, good web crawlers will break their hits up so they don't deluge a site with requests. Bad ones will do one of three things:</p> <ol> <li>hit sequential links one after the other</li> <li>hit sequential links in some paralell sequence (2 or more at a time.)</li> <li>hit sequential links at a fixed interval</li> </ol> <p>Also, some offline browsing programs will slurp up a number of pages, I'm not sure what kind of threshold you'd want to use, to start blocking by IP address.</p> <p>This method will also catch mirroring programs like fmirror or wget.</p> <p>If the bot randomizes the time interval, you could check to see if the links are traversed in a sequential or depth-first manner, or you can see if the bot is traversing a huge amount of text (as in words to read) in a too-short period of time. Some sites limit the number of requests per hour, also.</p> <p>Actually, I heard an idea somewhere, I don't remember where, that if a user gets too much data, in terms of kilobytes, they can be presented with a captcha asking them to prove they aren't a bot. I've never seen that implemented though.</p> Update on Hiding Links <p>As far as hiding links goes, you can put a div under another, with CSS (placing it first in the draw order) and possibly setting the z-order. A bot could not ignore that, without parsing all your javascript to see if it is a menu. To some extent, links inside invisible DIV elements also can't be ignored without the bot parsing all the javascript.</p> <p>Taking that idea to completion, uncalled javascript which could potentially show the hidden elements would possilby fool a subset of javascript parsing bots. And, it is not a lot of work to implement.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload