StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
10805119
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2012-05-29T19:05:01.937
FavoriteCount
0
LastActivityDate
2012-05-29T19:18:42.260
LastEditDate
2017-05-23T11:52:46.487
LastEditorUserId
-1
OwnerUserId
28760
ParentId
10793906
PostTypeId
2
Score
7
ViewCount
0
LastEditorDisplayName
text
Body
I'd recommend that you take a look the answer I posted to a similar question: <a href="https://stackoverflow.com/questions/8404775/how-to-identify-web-crawler/8405803#8405803">How to identify web-crawler?</a> Robots.txt The robots.txt is useful for polite bots, but spammers are generally not polite so they tend to ignore the robots.txt; it's great if you have robots.txt since it can help the polite bots. However, be careful not to block the wrong path as it can block the good bots from crawling content that you actually want them to crawl. User-Agent Blocking by user-agent is not fool-proof either, because spammers often impersonate browsers and other popular user agents (such as the Google bots). As a matter of fact, spoofing the user agent is one of the easiest thing that a spammer can do. Bot Traps This is probably the best way protect yourself from bots that are not polite and that don't correctly identify themselves with the User-Agent. There are at least two types of traps: <ul> <li>The robots.txt trap (which only works if the bot reads the robots.txt): dedicate an off-limits directory in the robots.txt and set up your server to block the IP address of any entity which tries to visit that directory.</li> <li>Create "hidden" links in your web pages that also lead to the forbidden directory and any bot that crawls those links AND doesn't abide by your robots.txt will step into the trap and get the IP blocked.</li> </ul> A hidden link is one which is not visible to a person, such as an anchor tag with no text: <code><a href="http://www.mysite.com/path/to/bot/trap"></a></code>. Alternately, you can have text in the anchor tag, but you can make the font really small and change the text color to match the background color so that humans can't see the link. The hidden link trap can catch any non-human bot, so I'd recommend that you combine it with the robots.txt trap so that you only catch bad bots. Verifying Bots The above steps will probably help you get rid of 99.9% of the spammers, but there might be a handful of bad bots who impersonate a popular bot (such as Googlebot) AND abide by your robots.txt; those bots can eat up the number of requests you've allocated for Googlebot and may cause you to temporarily disallow Google from crawling your website. In that case you have one more option and that's to verify the identity of the bot. Most major crawlers (that you'd want to be crawled by) have a way that you can identify their bots, here is Google's recommendation for verifying their bot: <a href="http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html" rel="nofollow noreferrer">http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html</a> Any bot that impersonates another major bot and fails verification can be blocked by IP. That should probably get you closer to preventing 99.99% of the bad bots from crawling your site.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POhow to allow known web crawlers and block spammers and harmful robots from scanning asp.net website
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USCommunity
UserOwnerUserId
1. USKiril
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POhow to allow known web crawlers and block spammers and harmful robots from scanning asp.net website
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.