Re: [Blind-sysadmins] robots.txt or harvester blocks

31 Aug 2009

      Hi,

You can't do this by using the robots.txt file.

If the spider is a friendly one that follows the rules, it first downloads 
the robots.txt file and checks which of the pages on that site it can index.
But if it is a bad spider, it won't care about the robots.txt file.

You can block the site visitors by IP or by other methods, but only the 
IP-based method is a good one. However, a bad spider can use more IPs and 
blocking those spiders won't help you too much.

Why do you want to block them?

--
Octavian

----- Original Message ----- 
From: "Dave" <dave.mehler@gmail.com>
To: "'Blind sysadmins list'" <blind-sysadmins@lists.hodgsonfamily.org>
Sent: Monday, August 31, 2009 1:32 AM
Subject: [Blind-sysadmins] robots.txt or harvester blocks
...
Hello,
This one is for webmasters or those who administer web servers. I'm
looking for a robots.txt file or perhaps entries to add to an httpd.conf 
or
.htaccess file to block or keep out bad spiders, crawlers that proliferate
spam, or any type of marketing thingy, anything other than a search 
engine.
If anyone has anything on this i'm interested.
Thanks.
Dave.
_______________________________________________
Blind-sysadmins mailing list
Blind-sysadmins@lists.hodgsonfamily.org
http://lists.hodgsonfamily.org/mailman/listinfo/blind-sysadmins

Re: [Blind-sysadmins] robots.txt or harvester blocks

Octavian Râsnita