From: www.itworld.com

Getting Started with Google Sitemaps

June 14, 2005 —

 

Google recently introduced a new service that offers a way to get more pages into Google's search index, faster site spidering and better representation. The service, Google Sitemaps, is designed to give site owners a tool to guide Google's crawler.



Google has released a script, Sitemap Generator that helps create the Sitemap files needed to guide Google's search engine.



Getting Started With Sitemap Generator



In order to use Google Sitemap Generator, your server needs to support Python 2.2, and you need to be able to connect to your web server and upload files. You also need to know the web server path to the desired output file. Sitemap Generator can create sitemaps from URL lists, web server directories, or from access logs.



The application is available for download via Sourceforge (see resources).



The archive contains these files:


* sitemap_gen.py

* README

* example_urllist.txt

* example_config.xml



Configuration



An example configuration file is provided. It's important to read through it and understand it prior to using the script.

		 <site
		   base_url="http://www.example.com/" 
		   store_into="/path/sitemap.xml.gz "
		   verbose="1">
		 
		   <urllist path="/path/urllist.txt" encoding="UTF-8" />
		 
		   <directory path="/path/dir" url="http://www.example.com/dir/" />
		 
		   <accesslog path="/path/access-0.log" />
		 
		   <filter action="drop" type="wildcard" pattern="*index.htm*" />
		 
		 </site>



Several configuration changes need to be made. You need to let the script know what your domain is (base_url), where it should write the output to (store_into), and then let it know where you want it to look for the URL information. You can specify individual URLs, URL list text files, directories, and log files. Google recommends setting the output path to the root of your domain.



Once the config.xml has been edited and saved, it needs to be uploaded to your web server, along with sitemap_gen.py script and, optionally, an urllist.txt file.



Once the files are copied to your server, you can execute the script:
$ python sitemap_gen.py --config=/path/config.xml


This will create a new sitemap.xml.gz in the location you specified.

Once you've got a sitemap saved to the root of your web server, Google's crawlers will be able to find it. Google also provides tools for monitoring the crawling of your site using a Sitemaps Account. We'll take a look at this in a future column!

ADDITIONAL RESOURCES

Google Sitemaps (BETA)

Sourceforge link

Google Sitemaps (BETA) Help