Note that there are some explanatory texts on larger screens.

plurals
  1. POEthics of robots.txt
    text
    copied!<p>I have a serious question. Is it ever ethical to ignore the presence of a robots.txt file on a website? These are some of the considerations I've got in mind:</p> <ol> <li><p>If someone puts a web site up they're expecting some visits. Granted, web crawlers are using bandwidth without clicking on ads that may support the site but the site owner is putting their site on the web, right, so how reasonable is it for them to expect that they'll never get visited by a bot? </p></li> <li><p>Some sites apparently use a robots.txt exactly in order to keep their site from being crawled by Google or some other utility that might grab prices and therefore allow people to do price comparisons easily. They have private search engines on the site so they obviously want people to be able to search the site; apparently they just don't want people to be able to easily compare their information with other vendors.</p></li> </ol> <p>As I said, I'm not trying to be argumentative; I would just like to know if anyone has ever come up with a case where it's ethically permissible to ignore the presence of a robots.txt file? I cannot think of a case where it's permissible to ignore the robots.txt mainly because people (or businesses) are paying money to put up their web sites so they should be able to tell the Googles/Yahoos/Other SE's of the world that they don't want to be on their indices. </p> <p>To put this discussion in context, I'd like to create a price comparison website and one of the major vendors has a robots.txt that basically prevents anyone from grabbing their prices. I'd like to be able to get their information but, as I said, I can't justify simply ignoring the wishes of the site owner. </p> <p>I have seen some very sharp discussion here and that's why I would like to hear the opinions of developers that follow Stack Overflow. </p> <p>By the way, there is some discussion of this topic on a <a href="http://news.ycombinator.com/item?id=205937" rel="nofollow noreferrer">Hacker News question</a> but they seem to mainly focus on the legal aspects of this. </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload