ITworld.com
  Search  
ITworld Home Page ITworld Webcasts ITworld White Papers ITworld Newsletters ITworld News ITworld Topics Careers ITworld Voices ITwhirled Changing the way you view IT

More spidercide

Network World 4/2/01

Mark Gibbs, Network World

Last newsletter I discussed the Robot Exclusion Protocol, a method by which you can inform programs that attempt to explore your Web site (a.k.a. spiders) what you would prefer they do and don’t look at.

On this topic

But where you want to exercise even finer control, you might consider using the Robots meta tag.

One of the biggest differences with the meta tag approach over REP is that meta tags don’t require a Webmaster’s involvement. You can use the Robots meta tag on any page whether or not the server has an REP robots.txt file.

Here’s an HTML page that uses the Robots meta tag:

<html>

<head>

<meta name="robots" content="noindex,nofollow">

<meta name="description" content="electric helicopters">

<title>Electric Whirlybirds</title>

</head>

<body>

A page of great information on�

</body>

</html>

The Robots meta tag is a list of up to two directives. The first is either "INDEX" or "NOINDEX," indicating whether or not the spidering software should index the page, while the second directive can be either "FOLLOW" or "NOFOLLOW" to indicate whether the links on the page should be followed.

The allowable permutations for the Robots meta tag contents are:

1. <meta name="robots" content="index,follow">

2. <meta name="robots" content="all">

3. <meta name="robots" content="noindex,follow">

4. <meta name="robots" content="index,nofollow">

5. <meta name="robots" content="noindex,nofollow">

6. <meta name="robots" content="none">

The first line of the above example shows the defaults (INDEX and FOLLOW) and line 2 is a shorthand variant of line 1. Likewise, line 6 is a shorthand version of line 5.

The description meta tag is also part of the Robots meta tag specification and defines the plain-text string (i.e. no mark-up tags) that should be used to summarize the page contents.

Note that like the REP, the Robots meta tag is merely a request to visiting spiders, and no explicit control can be enforced without creating a Web application that checks the value of the user-agent string in the HTTP request header and explicitly blocks the spider.

Also note that very few spiders pay attention to the Robots meta tag in comparison to those that are smart enough to understand the REP.

For more information of the Robots meta tag, see:

http://info.webcrawler.com/mak/projects/robots/meta-notes.html

Mark Gibbs is a contributing editor for the Network World reviews section.




Sponsored Links

Multi-Core Test Results In Virtualized Servers
Check Out The Latest Xeon® Performance Results. Virtualized Servers vs. Non-Virtualized Servers.
FREE virus, spyware & adware scan
Find the malware your AV missed with the Sophos Threat Detection Test.
Replace your mainframe 4GL and save with Spectrum Writer.
Powerful, easy 4GL. Custom reports. Export files for PC programs. Web reports. Download free trial.
Improving the View with IP Videoconferencing
New videoconferencing technologies are poised to benefit the enterprise.
Used and Refurbished HP ProCurve Switches
Lifetime Warranties, Professional Testing & Shipping on all HP Equipment Purchases!
» Buy a link now

Advertisements
Sponsored links
Bring harmony to your mix of UNIX-Linux-Windows computing environments
Top 5 Reasons to Combine App Performance and Security
KODAK i1400 Series Scanners stand up to the challenge
Locate Hidden Software on business PCs with this free tool
 Home   Application Development  Web development  Markup languages  HTML  HTML tags
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   Industry Standard   Infoworld   ITworld  
JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

DEMO   IDG Connect   IDG Knowledge Hub   IDG TechNetwork   IDG World Expo  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.