Inside Google Sitemaps

February 1, 2006, 11:57 AM —  ITworld.com — 

Google Sitemaps is a program that lets any site developer publish a map of their site and submit it to Google for indexing. It's designed to let companies guide Google's crawlers and should help get pages indexed more quickly and thoroughly.



The program is free, and given the importance of Google traffic to most sites, it's worth taking a look at how it works and what tools are available to support it.



Why Sitemaps Matter



It can be frustrating to see how long it takes for web pages to get indexed at sites like Google. If your pages don't show up in Google, you're probably losing out on traffic and sales or advertising revenue.



Sitemaps are a way to help Google index your site. While Google says that this won't raise your page ranking, it may mean that your site gets more traffic, simply because your pages are more thoroughly indexed.



Google has published a case study on Sitemaps looking at the experience of Interactive Sites, a developer of web-based products for the hospitality industry. The company found Sitemaps easy to implement, and effective at increasing traffic to their sites.



According to John Blayter, Interactive Sites' Director of Engineering, "We were incredibly impressed with how easy Sitemaps was to integrate into our CMS." Blayter and his team integrated Sitemaps into the company's CMS in a single day and placed it on 60 client websites.



Interactive Sites' implementation of Sitemaps improved both the coverage and freshness of content in Google's index. Since implementing Sitemaps, the company's clients have benefited from an average increase of 125 percent in the number of indexed pages - and in some cases more than 240 percent.



For many companies, an increase of 125 percent in the number of pages indexed could result in a big jump in site traffic.



What are Sitemaps?



A Sitemap is an XML file that resides on your website that is updated regularly with any changes and additions to your site.



The file is in a fairly simple format. In Sitemap XML files, the main element, urlset, encloses a collection of URLs. For each page on your site that you want to include in the Sitemap file, there is a url element with one required child element, the loc, which is just the page's URL. So, in its simplest form, a Sitemap is just a big list of the URLs at your site.


There are also several optional child elements for the url element, lastmod, changefreq, and priority:

* lastmod indicates when the URL was last update

* changefreq indicates how often the page is typically updated

* priority indicates the importance of this URL relative to the other URLs at your site.


Tools for Supporting Sitemap



There are many tools available to support using Sitemaps. First on the list would be tools for validating the Sitemap file. Because Sitemaps are XML files, you can use standard XML tools to validate them. Google has published XML schema defining the elements and attributes that can be used in Sitemaps (see Resources).



Google provides a script that is designed to help site owners create Sitemaps, Sitemap Generator. It's a free python script, downloadable via Sourceforge (see Resources). Many third-party tools have also emerged to support the creation of Sitemaps, including standalone scripts and updates to content management systems.



Google also provides a Web-based tool for checking on the status of your sites within their system. It lets you see if Google has read your Sitemap, and if it was successful or if there was an error.



You can also Verify your site, which lets you look at more detailed statistics. Google provides you with a unique file name for your site. It looks something like this: google8ad97837ed3875e83.html. You create an empty file with the name that Google provides and put it at the root level of your Web site. Then you log into Google and click the Verify button within your account. This process lets Google know that you control the site.



Once you have verified your site, Google provides additional information:

* Sitemap details and errors

* Indexing information about your site

* Query stats about your site

* Crawl stats about your site

* Page analysis of your site

* URLs from your site we were unable to crawl, and why we couldn't crawl them



Google Sitemaps provides a way for sites to ensure that they are thoroughly indexed. This, and the Web-based system Google provides for understanding the indexing of your site, make Sitemaps an important tool for web developers and site owners.




ADDITIONAL RESOURCES



Sitemap Protocol



Sitemap Schema

For Sitemaps


For Sitemap index files



Getting started with Google sitemaps



Sitemap Generator



Google Site Overview (Google account required)

 

ITworld.com

I like it!
Post a comment
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
Resources
White Paper

Symantec Backup Exec 12 and Backup Exec System Recovery 8 deliver industry leading Windows data protection and system recovery. Download this whitepaper to find out the top reasons to upgrade and how to get continuous data protection and complete system recovery.

Webcast

Data and system loss — from a hard drive failure, malicious attack, natural disaster, or simple human error — can happen anytime. Don’t leave your business vulnerable. Make sure you have a secure recovery strategy in place. Symantec's latest backup and system recovery technology can efficiently restore critical applications, individual emails and documents and even restore your entire system in minutes in the event of a loss.

White Paper

Businesses face a growing challenge to ensure that the IT environment is properly protected. Backup Exec 12 integrates with other applications in the Symantec family of products, to complement your current data protection strategy, keep your data securely backed up and make it recoverable when you need it most.

Free stuff

VMware ESX Server in the Enterprise
By Edward L. Haletky
Published Dec 29, 2007 by Prentice Hall.
Enter now! | Official rules | Sample chapter

Green IT
By Toby Velte, Anthony Velte, Robert C. Elsenpeter
To be published Oct. 10, 2008 by McGraw Hill Professional
Enter now! | Official rules | About the book

Featured Sponsor

AISO founders envisioned a Web hosting company that was environmentally friendly. While the company employed energy-efficient innovations like solar panels, its infrastructure produced unacceptable power and cooling requirements. Find out how AISO leveraged AMD technology to overcome their challenge in this case study white paper.

In this whitepaper, Scalar explores the opportunity to change the landscape with respect to mission critical databases built around Oracle. Leveraging technologies such as Linux, high-end commodity processing power and Oracle RAC technology to architect, design, build and maintain database infrastructure that delivers maximum availability, reliability and performance at a fraction of traditional cost.

On a typical day, weather.com, the Web site for The Weather Channel in Atlanta, serves up between 15 million and 20 million page views. But in September 2004, when back-to-back hurricanes ransacked Florida, the peak traffic on one day more than tripled: over 70 million page views by more than 7 million unique visitors. Read the full success story now.

More Resources