Bookmark and Share

Creating A Sitemap XML File

What Is A Sitemap XML File?

Sitemap XMLSitemaps provide a search engine friendly way for web site owners to list the pages on their site that are available for crawling. Search engines use sitemap.xml files as a table of contents for your website, so they know what pages you have and how to reach them.

A sitemap XML file contains a list of all the URLs on your site, each with corresponding metadata including each page’s update frequency, when it was last updated and how important each page is relative to the other pages on your website. This allows search engine spiders to more intelligently crawl and index your pages. While this is no guarantee of search engine placement or indexing, it is a big help.

Google, Yahoo! and Microsoft (MSN, Bing) all support the Sitemap protocol. While their web crawlers still discover your content primarily via in-page links, your sitemap XML file compliments this data and allows crawlers to do a more thorough and accurate job. That is why creating a sitemap XML file is a critical component of Search Engine Optimization (SEO).

While there seems to be a cloud of mystery around creating sitemaps, it’s actually easy – it’s just not explained very well in many places. In the following sections, we walk you through a clear step-by-step process of creating a sitemap XML file for your website.

How To Create A Sitemap XML File

The only program you need for creating sitemaps is Notepad or another plain text editor (not Word or a word processing program). The first 2 lines of your sitemap should be:

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″>

Then, you just start listing your pages in the specified format. Note that all pages must reside on the same domain. So http://www.yoursite.com, http://subdomain.yoursite.com and https://www.yoursite.com/cart would each have separate sitemaps. Before we show you the example of a URL listing in your sitemap, let’s first look at the tag definitions. It is important to understand what purpose each tag serves so you know what values to include.

From the official sitemap protocol at sitemaps.org:

XML Tag Requirement Description
<urlset> Required Encapsulates the file and references the current protocol standard. This is the tag that opens the document and closes it at the end of the file.
<url> Required “Parent” tag for each URL entry. The remaining tags are “children” of this tag. What this means is for each URL you add to the file, you will open and close with entry with <url> and </url>
<loc> Required This is the actual URL of one page. This URL must begin with the protocol (such as http) and end with a trailing slash, if your web server requires it (e.g. http://www.yoursite.com/). This value must be less than 2,048 characters.
<lastmod> Optional The date of last modification of the file. This date should be in W3C Datetime format. In this format, you can enter just the date as YYYY-MM-DD, or the date and time including time zone designator (TZD) as YYYY-MM-DDThh:mmTZD (The T appears literally and time is in 24-hour or military format, so 5:37 PM eastern standard time on January 2, 2010 would be 2010-01-02T17:37-05:00, where -05:00 is the offset from GMT for eastern standard time)

Note that this tag is separate from the If-Modified-Since (304) header the server can return, and search engines may use the information from both sources differently.

<changefreq> Optional How frequently the page is likely to change. This value provides general information to search engines and may not correlate exactly to how often they crawl the page. Valid values are:

  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never

The value “always” should be used to describe documents that change each time they are accessed. The value “never” should be used to describe archived URLs. The value of this tag is considered a hint to the search engines and not a command.

Even though search engine crawlers may consider this information when making decisions, they may crawl pages marked “hourly” less frequently than that, and they may crawl pages marked “yearly” more frequently than that. Crawlers may periodically crawl pages marked “never” so that they can handle unexpected changes to those pages.

<priority> Optional The priority of this URL relative to other URLs on your site. Valid values range from 0.0 to 1.0. This value does not affect how your pages are compared to pages on other sites—it only lets the search engines know which pages you deem most important for the crawlers.

The default priority of a page is 0.5.

The priority you assign to a page is not likely to influence the position of your URLs in a search engine’s result pages (SERPs). Search engines may use this information when selecting between URLs on the same site, so you can use this tag to increase the likelihood that your most important pages are present in a search index.

Assigning a high priority to all of the URLs on your site is not likely to help you. Since the priority is relative, it is only used to select between URLs on your site.

So let’s use that information to continue our sitemap XML file

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″>

    <url>
        <loc>http://www.yoursite.com/</loc>
        <lastmod>2010-01-02</lastmod>
        <changefreq>monthly</changefreq>
        <priority>0.8</priority>
    </url>

The code above shows a single URL or page on your site as it would be listed in your sitemap.xml file. To complete the file, repeat this for each page on your site. Note the trailing forward slash after the URL. This is necessary for any folder path, but is not used for individual pages (e.g. http://www.yoursite.com/boots/ is correct, http://www.yoursite.com/boots.html/ is incorrect). When in doubt, copy the address into the browser address bar and press enter. It will display the URL properly.

Using Special Characters in Your Sitemap.xml

If there are any special characters in your URL, they need to be entered into your sitemap.xml file in a particular way. Certain characters need to be “entity escaped”. The reason for this is XML recognizes these characters as having a special meaning. They need to be changed so the program that will read your XML file doesn’t think you’re trying to tell it to do something when you really want it to read your URL.

The most common characters that need to be entity escaped are the ampersand (&) to “&amp;”, the quotation mark ” to “&quot;”, the single quotation mark ‘ to “&apos;”, the greater-than symbol > to “&gt;” and the less-than symbol < to “&lt;”. This means whenever you have one of these symbols in your URL, simply replace it with the corresponding escape sequence (the part in quotes).

If you have any other special characters that are not part of general number, English letter or punctuation, you must use the UTF-8 code in place of that ASCII or HTML special character. View our HTML Special Characters page for a full listing that you can bookmark and keep as a reference.

Each of the two-character blocks in the UTF-8 encoding should be preceded by a “%” symbol and no spaces. For an example of this UTF-8 encoding, we will use the copyright symbol, ©. If that symbol were in your URL, it would be replaced with the UTF-8 encoding for that character, which is %C2%A9. The most common need to replace a special character with UTF-8 exists when accent characters, such as à, é and ñ, are used.

Continuing our example sitemap, we will add some URLs that require special character encoding. We will use
http://www.yoursite.com/àccent.php?id=23&cat=block
http://www.yoursite.com/cart/number?id=12
The characters in red will need to be encoded. Therefore, our sitemap.xml now looks like:

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″>

    <url>
        <loc>http://www.yoursite.com/</loc>
        <lastmod>2010-01-02</lastmod>
        <changefreq>monthly</changefreq>
        <priority>0.8</priority>
    </url>
        <loc>http://www.yoursite.com/%C3%A0ccent.php?id=23&amp;cat=block</loc>
        <lastmod>2010-01-08</lastmod>
        <changefreq>daily</changefreq>
    <url>
        <loc>http://www.yoursite.com/cart/number?id=&apos;12&apos;</loc>
        <lastmod>2010-01-12</lastmod>
        <priority>0.3</priority>
    </url>

Notice that we did not use all 4 tags in every URL reference block. Remember, the lastmod, changefreq and priority tags are optional.

Some Sitemap.XML Rules

  • Each sitemap you provide can have no more than 50,000 URLs
  • Each sitemap you provide can be no larger than 10MB (10,485,760 bytes)
  • Your sitemap file should be named “sitemap.xml”
  • Your sitemap file should be placed in the root directory of your website
    (e.g. http://www.yoursite.com/sitemap.xml)
  • Remember, each sub-domain requires its own sitemap.xml file, as does any secure (https://) section of your site that you want indexed by search engines. URLs from http://subdomian.yoursite.com cannot be included in the sitemap for http://www.yoursite.com.
  • If you need to list more than 50,000 URLs, you must create multiple Sitemap files.
  • If you have multiple sitemaps under the same domain, you must create a sitemap index file (see below).

Example Sitemap XML

Here is our finished example sitemap.xml file.

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″>

    <url>
        <loc>http://www.yoursite.com/</loc>
        <lastmod>2010-01-02</lastmod>
        <changefreq>monthly</changefreq>
        <priority>0.8</priority>
    </url>
        <loc>http://www.yoursite.com/%C3%A0ccent.php?id=23&amp;cat=block</loc>
        <lastmod>2010-01-08</lastmod>
        <changefreq>daily</changefreq>
    <url>
        <loc>http://www.yoursite.com/cart/number?id=&apos;12&apos;</loc>
        <lastmod>2010-01-12</lastmod>
        <priority>0.3</priority>
    </url>
    <url>
        <loc>http://www.yoursite.com/archive/data.html</loc>
        <lastmod>2008-10-25T17:42-05:00</lastmod>
        <changefreq>never</changefreq>
    </url>
    <url>
        <loc>http://www.yoursite.com/featured%20artists.html</loc>
        <changefreq>weekly</changefreq>
    </url>
    <url>
        <loc>http://www.yoursite.com/samplepage.html</loc>
        <lastmod>2010-02-06T08:30+02:00</lastmod>
        <changefreq>yearly</changefreq>
        <priority>0.5</priority>
    </url>
    <url>
        <loc>http://www.yoursite.com/query?=live&amp;set=newuser</loc>
        <changefreq>always</changefreq>
        <priority>0.7</priority>
    </url>
</urlset>

Sitemap Index File

If you have multiple sitemaps on the same domain (for example because you have more than 50,000 pages) you need to create a sitemap index file. Here is the example directly from sitemaps.org. The Sitemap index file must:

  • Begin with an opening <sitemapindex> tag and end with a closing </sitemapindex> tag
  • Include a <sitemap> entry for each Sitemap as a parent XML tag
  • Include a <loc> child entry for each <sitemap> parent tag

The optional <lastmod> tag is also available for Sitemap index files. A Sitemap index file can only specify Sitemaps that are found on the same site as the Sitemap index file.

For example, http://www.yoursite.com/sitemap_index.xml can include Sitemaps on http://www.yoursite.com but not on http://www.example.com or http://subdomain.yoursite.com. As with Sitemaps, your Sitemap index file must be UTF-8 encoded and entity escaped. The following example shows a Sitemap index that lists two Sitemaps:

<?xml version=”1.0″ encoding=”UTF-8″?>
<sitemapindex xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″>
    <sitemap>
        <loc>
http://www.example.com/sitemap1.xml</loc>
        <lastmod>
2004-10-01T18:23:17+00:00</lastmod>
    </sitemap>
    <sitemap>
        <loc>
http://www.example.com/sitemap2.xml</loc>
        <lastmod>
2005-01-01</lastmod>
    </sitemap>
</sitemapindex>

Submitting Your Sitemap.xml File to the Search Engines

There are 3 ways to submit sitemap xml files to search engines. The first is to use the search engine’s submission tool. To do this, you visit the search engine and follow their instructions. This is typically the preferred way to submit your sitemap. The second method is to allow search engine crawlers to find your sitemap via your robots.txt file. Simply add the following on its own line in your robots.txt file.

Sitemap: http://www.example.com/sitemap.xml

You can specify more than one sitemap in your robots.txt file by adding the URL path to each sitemap on its own line. If you have multiple sitemaps and are using a sitemap index file, you only need to reference the sitemap index file in your robots.txt file. The same as with sitemaps, if you have more than one sitemap index file, list the URL path to each file on its own line. If you want to learn more about robots.txt, visit our robots.txt tutorial.

The third method is to submit your sitemap using an HTTP request. This directs the search engine to ping your sitemap, letting them know it’s there and needs to be crawled. The general format of this request is

<searchengine_URL>/ping?sitemap=sitemap_url

where <searchengine_URL> is the URL provided by the search engine, and sitemap_url is the path to your sitemap. If you have a sitemap index file, you only need to send the HTTP request for that file, not each individual sitemap that it references. Also, the characters in the HTTP request URL need to be UTF-8 encoded. So when you enter the path to your sitemap (http://www.yoursite.com/sitemap.xml), it will look like:

<searchengine_URL>/ping?sitemap=http%3A%2F%2Fwww.yoursite.com%2Fsitemap.xml

where %3A is the UTF-8 code for a colon “:” symbol and %2F is the UTF-8 code for a forward slash “/”. According to the Sitemap standard, “You can issue the HTTP request using wget, curl, or another mechanism of your choosing. A successful request will return an HTTP 200 response code; if you receive a different response, you should resubmit your request. The HTTP 200 response code only indicates that the search engine has received your Sitemap, not that the Sitemap itself or the URLs contained in it were valid.

Common Sitemap XML Questions

1. What if I have a blog with 1000 pages… do you really expect me to type all of that out?
Thankfully no. WordPress has a few good sitemap plugins that will generate a sitemap for your entire blog at the touch of a button. Content Management Systems (CMS) like Joomla also have similar plugins. If your blog is on the same domain as your main website, copy the contents of the generated XML sitemap into your main sitemap (without copying the header and <sitemap> tags since they are already present in your main sitemap file). You can also keep them separate and create a sitemap index file to reference them both so you can more easily keep them updated as you update your website and blog.

2. URLs on my site have session IDs in them. Do I need to remove them?
Yes. Including session IDs in URLs may result in incomplete and redundant crawling of your site.

3. Does position of a URL in a Sitemap influence its use?
No. The position of a URL in the Sitemap is not likely to impact how it is used or regarded by search engines.

4. My site has thousands of URLs; can I somehow submit only those that have changed recently?
There are 2 possible solutions to this. The first option is to list the URLs that change frequently in a small number of Sitemaps and then use the <lastmod> tag in your Sitemap index file to identify those Sitemap files. Search engines can then incrementally crawl only the changed Sitemaps. The second option is to get a sitemap generator script for your site and have it run periodically to capture updated pages. Then you can do the above without as much manual labor.

 

Navigate

Hosting Info

Hosting Help

Online Business

Join Our Newsletter

Copyright © 2002- MyMultiHost.com, Hylidix LLC - #74, Reading, MA 01867-0174. All rights reserved. Disclaimer | Disclosure