Google has released a script, Sitemap Generator that helps create the
Sitemap files needed to guide Google's search engine.
Getting Started With Sitemap Generator
In order to use Google Sitemap Generator, your server needs to support
Python 2.2, and you need to be able to connect to your web server and
upload files. You also need to know the web server path to the desired
output file. Sitemap Generator can create sitemaps from URL lists, web
server directories, or from access logs.
The application is available for download via Sourceforge (see
resources).
The archive contains these files:
- sitemap_gen.py
- README
- example_urllist.txt
- example_config.xml
Configuration
An example configuration file is provided. It's important to read
through it and understand it prior to using the script.
<site
base_url="http://www.example.com/"
store_into="/path/sitemap.xml.gz "
verbose="1">
<urllist path="/path/urllist.txt" encoding="UTF-8" />
<directory path="/path/dir" url="http://www.example.com/dir/" />
<accesslog path="/path/access-0.log" />
<filter action="drop" type="wildcard" pattern="*index.htm*" />
</site>
Several configuration changes need to be made. You need to let the
script know what your domain is (base_url), where it should write the
output to (store_into), and then let it know where you want it to look
for the URL information. You can specify individual URLs, URL list text
files, directories, and log files. Google recommends setting the output
path to the root of your domain.
Once the config.xml has been edited and saved, it needs to be uploaded
to your web server, along with sitemap_gen.py script and, optionally, an
urllist.txt file.
Once the files are copied to your server, you can execute the script:
$ python sitemap_gen.py --config=/path/config.xml
This will create a new sitemap.xml.gz in the location you specified.
Once you've got a sitemap saved to the root of your web server, Google's
crawlers will be able to find it. Google also provides tools for
monitoring the crawling of your site using a Sitemaps Account. We'll
take a look at this in a future column!
ADDITIONAL RESOURCES
Google Sitemaps (BETA)
https://www.google.com/webmasters/sitemaps/login
Sourceforge link:
http://sourceforge.net/project/showfiles.php?group_id=137793&package_id=153422
Google Sitemaps (BETA) Help
https://www.google.com/webmasters/sitemaps/docs/en/sitemap-generator.html