Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Complicated stuff.</p> <p>From my experience, it depends more on what URL scheme do you use to link pages together that will determine if the crawler will crawls which pages.</p> <ul> <li><p>Most engine crawl the entire website, if it is all <em>properly hyperlinked</em> with a <em>crawl-friendly URLs</em> e.g. use URL rewriting instead of a topicID=123 querystrings and that all pages are easily linkable a few clicks from the main page.</p></li> <li><p>Another case is paging, if you have paging sometimes the bot crawl just the first page and stops when it finds the next-page link keeps hitting the same document e.g. one index.php for the entire website.</p></li> <li><p>You wouldn't want a bot to accidently hit some webpage that perform certain actions e.g. a "Delete topic" link that links to "delete.php?topicID=123" so most crawlers will check for those cases as well.</p></li> <li><p>The <a href="http://www.seomoz.org/tools" rel="noreferrer">Tools page at SEOmoz</a> also provide a lot of information and insight about the way some crawlers work and what information it will extract and chew etc. You could use those to determine wether the pages deep inside your forum e.g. a year-old post might gets crawled or not.</p></li> <li><p>And some crawlers enable you to customize their crawling behavior... something like <a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=40318" rel="noreferrer">Google Sitemaps</a>. You could tell them to do-crawl and don't-crawl which pages and on which order etc. I remember there are such services available from MSN and Yahoo as well but have never tried it out myself.</p></li> <li><p>You can throttle the crawling bot so it doesn't overwhelm your website by providing a <a href="http://www.robotstxt.org/" rel="noreferrer">robots.txt</a> file in the website root.</p></li> </ul> <p>Basically, if you design your forum so that the URLs doesn't look hostile to the crawlers, it'll merrily crawls the entire website.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload