Getting Started with Google Sitemaps
Google recently introduced a new service that offers a way to get more pages into Google's search index, faster site spidering and better representation. The service, Google Sitemaps, is designed to give site owners a tool to guide Google's crawler.
Google has released a script, Sitemap Generator that helps create the Sitemap files needed to guide Google's search engine.
Getting Started With Sitemap Generator
In order to use Google Sitemap Generator, your server needs to support Python 2.2, and you need to be able to connect to your web server and upload files. You also need to know the web server path to the desired output file. Sitemap Generator can create sitemaps from URL lists, web server directories, or from access logs.
The application is available for download via Sourceforge (see resources).
The archive contains these files:
* sitemap_gen.py
* README
* example_urllist.txt
* example_config.xml
Configuration
An example configuration file is provided. It's important to read through it and understand it prior to using the script.
<site base_url="http://www.example.com/" store_into="/path/sitemap.xml.gz " verbose="1"> <urllist path="/path/urllist.txt" encoding="UTF-8" /> <directory path="/path/dir" url="http://www.example.com/dir/" /> <accesslog path="/path/access-0.log" /> <filter action="drop" type="wildcard" pattern="*index.htm*" /> </site>
Several configuration changes need to be made. You need to let the script know what your domain is (base_url), where it should write the output to (store_into), and then let it know where you want it to look for the URL information. You can specify individual URLs, URL list text files, directories, and log files. Google recommends setting the output path to the root of your domain.
Once the config.xml has been edited and saved, it needs to be uploaded to your web server, along with sitemap_gen.py script and, optionally, an urllist.txt file.
Once the files are copied to your server, you can execute the script:
$ python sitemap_gen.py --config=/path/config.xml
This will create a new sitemap.xml.gz in the location you specified.
Once you've got a sitemap saved to the root of your web server, Google's crawlers will be able to find it. Google also provides tools for monitoring the crawling of your site using a Sitemaps Account. We'll take a look at this in a future column!
ADDITIONAL RESOURCES
ITworld.com, Ecommerce in Action
Sign up for ITworld's Daily newsletter
Follow ITworld on Twitter @IT_world
jfruh
Apple syncing patent can't come soon enough
pasmith
New Twitter features borrow from 3rd party clients
Esther Schindler
Open Source Changes the Software Acquisition Process
mikelgan
How to set up continuous podcast play on the new iTunes
David Strom
Five important Windows 7 mobility features
sjvn
Guard your Wi-Fi for your own sake
Sandra Henry-Stocker
Grepping on Whole Words
Sidekick: The Good News & the Bad News
Either way you look at it Microsoft Data Center management did not follow standards or best practices in this failure. In which case it makes me wonder more about the outsourcing of corporate data much less personal data.
- mburton325
Join the conversation here
Quick, practical advice for IT pros. Made fresh daily.
Want to cash in on your IT savvy? Send your tip to tips@itworld.com. If we post it, we'll send you a $25 Amazon e-gift card.













