ITworld.com
  Search  
ITworld Home Page ITworld Webcasts ITworld White Papers ITworld Newsletters ITworld News ITworld Topics Careers ITworld Voices ITwhirled Changing the way you view IT
Getting started with Google sitemaps
ECOMMERCE IN ACTION --- 06/15/2005

James Lewin

Google recently introduced a new service that offers a way to get more pages into Google's search index, faster site spidering and better representation. The service, Google Sitemaps, is designed to give site owners a tool to guide Google's crawler. 

On this topic

Google has released a script, Sitemap Generator that helps create the Sitemap files needed to guide Google's search engine.

Getting Started With Sitemap Generator

In order to use Google Sitemap Generator, your server needs to support Python 2.2, and you need to be able to connect to your web server and upload files. You also need to know the web server path to the desired output file. Sitemap Generator can create sitemaps from URL lists, web server directories, or from access logs.

The application is available for download via Sourceforge (see resources).

The archive contains these files:

  • sitemap_gen.py
  • README
  • example_urllist.txt
  • example_config.xml

Configuration

An example configuration file is provided. It's important to read through it and understand it prior to using the script.

<site
base_url="http://www.example.com/" store_into="/path/sitemap.xml.gz " verbose="1">

<urllist path="/path/urllist.txt" encoding="UTF-8" />

<directory path="/path/dir" url="http://www.example.com/dir/" />

<accesslog path="/path/access-0.log" />

<filter action="drop" type="wildcard" pattern="*index.htm*" />

</site>

Several configuration changes need to be made. You need to let the script know what your domain is (base_url), where it should write the output to (store_into), and then let it know where you want it to look for the URL information. You can specify individual URLs, URL list text files, directories, and log files. Google recommends setting the output path to the root of your domain.

Once the config.xml has been edited and saved, it needs to be uploaded to your web server, along with sitemap_gen.py script and, optionally, an urllist.txt file.

Once the files are copied to your server, you can execute the script: $ python sitemap_gen.py --config=/path/config.xml

This will create a new sitemap.xml.gz in the location you specified.

Once you've got a sitemap saved to the root of your web server, Google's crawlers will be able to find it. Google also provides tools for monitoring the crawling of your site using a Sitemaps Account. We'll take a look at this in a future column!

ADDITIONAL RESOURCES

Google Sitemaps (BETA)
https://www.google.com/webmasters/sitemaps/login

Sourceforge link:
http://sourceforge.net/project/showfiles.php?group_id=137793&package_id=153422

Google Sitemaps (BETA) Help
https://www.google.com/webmasters/sitemaps/docs/en/sitemap-generator.html

 

James Lewin is a system engineer and Web analyst. He has worked in digital publishing since 1987, and with the Internet since 1995. His articles have appeared in a variety of offline and online publications including IBM DeveloperWorks. Reach him at: lewingroup.com, or via his web site at: http://www.lewingroup.com. Find his most recent ITworld.com articles at: http://www.itworld.com/nl/ecom_in_act/.



Advertisements
Sponsored links
KODAK i1400 Series Scanners stand up to the challenge
Bring harmony to your mix of UNIX-Linux-Windows computing environments
Locate Hidden Software on business PCs with this free tool
Top 5 Reasons to Combine App Performance and Security
 Home   Newsletters  ECOMMERCE IN ACTION
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   IDG Connect   IDG World Expo   Industry Standard   Infoworld   ITworld   JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.