Getting Started with Google Sitemaps

ITworld.com, Ecommerce in Action |  Small Business Add a new comment

Google recently introduced a new service that offers a way to get more pages into Google's search index, faster site spidering and better representation. The service, Google Sitemaps, is designed to give site owners a tool to guide Google's crawler.



Google has released a script, Sitemap Generator that helps create the Sitemap files needed to guide Google's search engine.



Getting Started With Sitemap Generator



In order to use Google Sitemap Generator, your server needs to support Python 2.2, and you need to be able to connect to your web server and upload files. You also need to know the web server path to the desired output file. Sitemap Generator can create sitemaps from URL lists, web server directories, or from access logs.



The application is available for download via Sourceforge (see resources).



The archive contains these files:


* sitemap_gen.py

* README

* example_urllist.txt

* example_config.xml



Configuration



An example configuration file is provided. It's important to read through it and understand it prior to using the script.

		 <site
		   base_url="http://www.example.com/" 
		   store_into="/path/sitemap.xml.gz "
		   verbose="1">
		 
		   <urllist path="/path/urllist.txt" encoding="UTF-8" />
		 
		   <directory path="/path/dir" url="http://www.example.com/dir/" />
		 
		   <accesslog path="/path/access-0.log" />
		 
		   <filter action="drop" type="wildcard" pattern="*index.htm*" />
		 
		 </site>



Several configuration changes need to be made. You need to let the script know what your domain is (base_url), where it should write the output to (store_into), and then let it know where you want it to look for the URL information. You can specify individual URLs, URL list text files, directories, and log files. Google recommends setting the output path to the root of your domain.



Once the config.xml has been edited and saved, it needs to be uploaded to your web server, along with sitemap_gen.py script and, optionally, an urllist.txt file.



Once the files are copied to your server, you can execute the script:
$ python sitemap_gen.py --config=/path/config.xml


This will create a new sitemap.xml.gz in the location you specified.

Once you've got a sitemap saved to the root of your web server, Google's crawlers will be able to find it. Google also provides tools for monitoring the crawling of your site using a Sitemaps Account. We'll take a look at this in a future column!

ADDITIONAL RESOURCES

Google Sitemaps (BETA)

Sourceforge link

Google Sitemaps (BETA) Help



    Add a comment

    Post a comment using one of these accounts
    Or join now
    At least 6 characters

    Note: Comment will appear soon after you have activated your account.
    Obscene/spam comments will be removed and accounts suspended.
    The information you submit is subject to our Privacy Policy and Terms of Service.

    ITworld LIVE

    Small BusinessWhite Papers & Webcasts

    White Paper

    Microsoft Volume Licensing Comparison - Small/Med. Business

    This quick-reference document lets small and medium organizations (i.e. those with five or more devices) to easily compare the available Microsoft Volume Licensing programs to create a simple, cost-effective and flexible way to benefit from volume licensing.

    White Paper

    ESG: Oracle Database Appliance: A Simple, Economical Option for SMBs and Independent Software Vendors

    Read this technology overview of a DBMS built for SMBs that provides a rapidly-deployable, highly-available platform at an affordable cost

    See more White Papers | Webcasts

    Ask a question

    Ask a Question