Getting Started with Google Sitemaps
Google recently introduced a new service that offers a way to get more pages into Google's search index, faster site spidering and better representation. The service, Google Sitemaps, is designed to give site owners a tool to guide Google's crawler.
Google has released a script, Sitemap Generator that helps create the Sitemap files needed to guide Google's search engine.
Getting Started With Sitemap Generator
In order to use Google Sitemap Generator, your server needs to support Python 2.2, and you need to be able to connect to your web server and upload files. You also need to know the web server path to the desired output file. Sitemap Generator can create sitemaps from URL lists, web server directories, or from access logs.
The application is available for download via Sourceforge (see resources).
The archive contains these files:
* sitemap_gen.py
* README
* example_urllist.txt
* example_config.xml
Configuration
An example configuration file is provided. It's important to read through it and understand it prior to using the script.
<site base_url="http://www.example.com/" store_into="/path/sitemap.xml.gz " verbose="1"> <urllist path="/path/urllist.txt" encoding="UTF-8" /> <directory path="/path/dir" url="http://www.example.com/dir/" /> <accesslog path="/path/access-0.log" /> <filter action="drop" type="wildcard" pattern="*index.htm*" /> </site>
Several configuration changes need to be made. You need to let the script know what your domain is (base_url), where it should write the output to (store_into), and then let it know where you want it to look for the URL information. You can specify individual URLs, URL list text files, directories, and log files. Google recommends setting the output path to the root of your domain.
Once the config.xml has been edited and saved, it needs to be uploaded to your web server, along with sitemap_gen.py script and, optionally, an urllist.txt file.
Once the files are copied to your server, you can execute the script:
$ python sitemap_gen.py --config=/path/config.xml
This will create a new sitemap.xml.gz in the location you specified.
Once you've got a sitemap saved to the root of your web server, Google's crawlers will be able to find it. Google also provides tools for monitoring the crawling of your site using a Sitemaps Account. We'll take a look at this in a future column!
ADDITIONAL RESOURCES
ITworld.com, Ecommerce in Action
Sign up for ITworld's Daily newsletter
Follow ITworld on Twitter @IT_world
Esther Schindler
If the comments are ugly, the code is ugly
claird
SVG a graphics format for 21st century
pasmith
Take Chrome OS for a test spin
Sandra Henry-Stocker
Solaris Tip: Have Your Files Changed Since Installation?
jfruh
Android fragments vs. the iPhone monolith
mikelgan
What Gizmodo missed about the Pro WX Wireless USB disk drive
Sidekick: The Good News & the Bad News
Either way you look at it Microsoft Data Center management did not follow standards or best practices in this failure. In which case it makes me wonder more about the outsourcing of corporate data much less personal data.
- mburton325
Join the conversation here
Quick, practical advice for IT pros. Made fresh daily.
Want to cash in on your IT savvy? Send your tip to tips@itworld.com. If we post it, we'll send you a $25 Amazon e-gift card.













