Ban specific browsers or robots from browsing your site

May 21, '03 09:00:00AM

Contributed by: hagbard

Ever want to ban a specific robot technology from browsing your hosted website? For example, perhaps there's a bot that ignores the robot exclusion rules. Here's how you can do just that -- ban a specific robot or browser from your site. Note that this hint could be helpful for lots of other tasks, like specific behavior according to specific http headers. See Apache's documentation on Module mod_setenvif and Environment Variables in Apache for more info.

It only takes two lines in your httpd.conf file to ban any specific browser or robot. The first is in the <IfModule mod_setenvif.c> section. Somewhere in that section, add:

  BrowserMatch "some_text" badbot
Replace some_text with a word that appears in the User-Agent HTTP request header field. For instance, if you wanted to (not that you would!) block the Mozilla browser from your site, use "Mozilla" to replace "some_text".

Then inside your <Directory "/Library/WebServer/Documents"> section, add the following line before the Allow from all line (I removed the comments, showing only the active commands):
<Directory "/Library/WebServer/Documents">
  Options Indexes FollowSymLinks MultiViews
  AllowOverride All
  Order allow,deny
  # The next line is the new addition
  deny from env=badbot
  Allow from all
</Directory>
What we did is quite straightforward. If the User-Agent field contains "some_text" anywhere inside it, the environment variable 'badbot' is declared; it is non-existent otherwise. Then, when checking for accessibility, if 'badbot' exists, the client is denied.

Comments (0)


Mac OS X Hints
http://hints.macworld.com/article.php?story=20030519102221674