Last newsletter I discussed the Robot Exclusion Protocol, a method by which you can inform programs that attempt to explore your Web site (a.k.a. spiders) what you would prefer they do and dont look at.
But where you want to exercise even finer control, you might consider using the Robots meta tag.
One of the biggest differences with the meta tag approach over REP is that meta tags dont require a Webmasters involvement. You can use the Robots meta tag on any page whether or not the server has an REP robots.txt file.
Heres an HTML page that uses the Robots meta tag:
<html>
<head>
<meta name="robots" content="noindex,nofollow">
<meta name="description" content="electric helicopters">
<title>Electric Whirlybirds</title>
</head>
<body>
A page of great information on�
</body>
</html>
The Robots meta tag is a list of up to two directives. The first is either "INDEX" or "NOINDEX," indicating whether or not the spidering software should index the page, while the second directive can be either "FOLLOW" or "NOFOLLOW" to indicate whether the links on the page should be followed.
The allowable permutations for the Robots meta tag contents are:
1. <meta name="robots" content="index,follow">
2. <meta name="robots" content="all">
3. <meta name="robots" content="noindex,follow">
4. <meta name="robots" content="index,nofollow">
5. <meta name="robots" content="noindex,nofollow">
6. <meta name="robots" content="none">
The first line of the above example shows the defaults (INDEX and FOLLOW) and line 2 is a shorthand variant of line 1. Likewise, line 6 is a shorthand version of line 5.
The description meta tag is also part of the Robots meta tag specification and defines the plain-text string (i.e. no mark-up tags) that should be used to summarize the page contents.
Note that like the REP, the Robots meta tag is merely a request to visiting spiders, and no explicit control can be enforced without creating a Web application that checks the value of the user-agent string in the HTTP request header and explicitly blocks the spider.
Also note that very few spiders pay attention to the Robots meta tag in comparison to those that are smart enough to understand the REP.
For more information of the Robots meta tag, see:
http://info.webcrawler.com/mak/projects/robots/meta-notes.html