XML Marks the Spot: A Guide to Sitemaps

Getting lost in the world of sitemaps? We’ve got the lowdown on XML sitemaps and why they’re so important to the visibility of your website.

Daria Szymanska

There is a big misunderstanding about sitemaps. Many website owners believe that once they’ve created and implemented an HTML sitemap on their site, an XML sitemap is unnecessary. This isn’t quite right. Although having an XML sitemap is not mandatory or even needed for the proper functioning of a website, it is recommended you have one on your site. In some cases, especially for large websites, having a sitemap is a must unless you don’t want to follow and implement best SEO practices.

In this blog, we will explain what an XML sitemap is, best practices for sitemaps and how your website and business can benefit from having a sitemap.

 

What is an XML sitemap?

An XML Sitemap is a file that contains a list of URLs on your website. It works as a roadmap which tells search engines what content is available on the website and leads search engines to most important pages on the site. The standard XML tag used for sitemaps can be described as a schema and is used by all the major search engines (Google, Bing, Yahoo).

A standard XML sitemap consists of a few elements:

  • < urlset> (required) – Encapsulates the file and references the current protocol standard
  • <url> (required) – Parent tag for each URL entry. The remaining tags are children of this tag
  • <loc> (required) – URL of the page. This URL must contain the domain name
  • <lastmod> (optional) – The data of last modification of the file
  • <changefreq> (optional) – How frequently the page is likely to change
  • <priority> (optional) – The priority of this URL relative to other URLs on your site

Sitemap protocol

Although the most popular format of a sitemap is an XML sitemap, there are other types of sitemaps you can implement on your website. The kinds of sitemaps you choose depend on the type of your website – always think what is most suitable for your site before you start to build and submit a sitemap.

If the images or videos you have on your site are crucial to your business (e.g. photographers, ecommerce sites) it is worth creating a separate sitemap. In other cases, it’s a waste of crawl budget and the best practice is to add media and images to the existing sitemap using Google sitemap extensions for additional media types.

To add media and images to your sitemap you must first add the correct XML namespace in the urlset tag. This tells Google which type of schemas are used by your sitemap to communicate page information.

 

Types of Sitemap

There are two types of sitemap:

Static sitemap

A static sitemap is a sitemap generated via a tool, such as Screaming Frog or Yoast. It’s an easy way to create and submit your sitemap to Google. The drawback of having this kind of sitemap is the fact that it will become out-of-date soon. Every time you publish new content, add or remove pages you will have to update the sitemap manually. To check if you have static sitemap, just simply look at the file, as it includes the name of the tool that had been used to create a sitemap.

<!– Generated by Screaming Frog SEO Spider 8.3 –>

<urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″ xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd”>

Including videos and images in your XML sitemap can help Google discovers this type of content and potentially include them in search results.

One of the first things you should do is to check if your site has a sitemap.

 

Dynamic sitemap

Unlike, a static sitemap, a dynamic sitemap is updated automatically. To create it you can use of many available dynamic sitemap generators tools, for example, Yoast SEO plugin for WordPress.

 

Benefits of Having a Sitemap

As we mentioned before, creating an XML sitemap for your website is not necessary. Having a sitemap doesn’t guarantee that your pages will be index, however, it won’t hurt ranking. So, as the saying goes: plenty is no plague, so submitting a sitemap will bring you benefits:

  • First and foremost, your sitemap works like an index page in a book – It helps search engines understand your website structure and content more easily. As a result, new pages will be found, indexed and displayed much faster.
  • A sitemap keeps search engines updated as it tells them the last time was your site was modified, new content was added etc. It will help search engines to decide whether/if they should index the website again.
  • Sitemaps are especially recommended if you have a large site with deep website structure, you make constant changes, add or remove pages, e.g. e-commerce site. Just to remember that a sitemap has some limitations – each sitemap shouldn’t contain more than 50,000 URLs
  • A sitemap is very helpful if you migrated your site and what to inform search engines about a new list of URLs to be crawled and indexed.
  • Sitemaps are also useful for new websites that don’t have many backlinks

 

Best Practices for XML Sitemaps

Not all URLs have to be included within a sitemap. Below we are listing best practice you should keep in mind:

  • Include only canonical URLs. Submit a preferred version of URL to index to help search engines save time and budget by not giving them more URLs to be crawled.
  • Keep only 200 URLs. Remove any 301s or 404s; the only exception to keep 301 URLs within a sitemap is during a site migration.
  • Use only your preferred URL format. Make sure that your sitemap URLs contain the domain name. If your site uses HTTPS, your sitemap URLs should also use this version.
  • Exclude paginated pages.
  • Exclude pages blocked by robots.txt and pages with noindex status. If you submit a sitemap with pages that are blocked and noindex it is a lack of consistency. You tell search engines that you have important pages that should be indexed and at the same time these pages shouldn’t be indexed.
  • Use a dynamically generated XML sitemap to keep your site fresh and up to date.
  • Submit your sitemap to Google Search Console and Bing Webmaster Tools to speed up crawling and indexing process. It is recommended that you reference your sitemap in robots.txt file as well.

Sitemap dashboard

  • Include the <lastmod> tag to your sitemap to tell search engines when the URL was last updated. As John Mueller said some time ago, most sitemap tags such as <priority> and <changefreq> are not taken into consideration. The only sitemap tag that really matters is <lastmode> that can speed up re-crawling of URLs.
  • Keep URLs only with a single sitemap. Having URLs in multiple sitemaps isn’t a problem but generally a URL should be submitted only in a single XML sitemap.
  • Create a separate sitemap for images, videos and news only if indexation drives your KPIs.
  • Keep your sitemap file size as small as possible, compress the file size using gzip.
  • Include hreflang tags in sitemap if a site is written in more than one language. Adding languages extension will help search engines understand which site should be displayed in the search results of specific regions.
  • Investigate orphan and missing pages in sitemap. An orphan page is a page that is only in a XML sitemap as is not linked internally. Keeping orphan pages within a sitemap gives search engines more URLs to crawl and, if they are not correct, they shouldn’t be seen by users. If orphan pages are important they should be indexed and internal linking should be added.

 

Test Your Sitemap

Once your sitemap is created and submitted to GSC and Bing Webmaster Tools you should check it regularly in order to spot any errors. There are different tools to help you quickly and easily audit your site and fix problems. One of the best tools we use in our agency and highly recommend it Screaming Frog. Both GSC and Bing will also give you insight to how your sitemap performs.

Sitemaps are one of the best ways for Google to recognise the pages of your website so it can rank them accordingly.