sitemaps explained

27 Jun Sitemaps Explained: Frequently Asked Questions About Your Sitemap

From the early of SEO to now, sitemaps have been an integral component to ensuring sites are consistently crawled and indexed by search engine crawlers. Despite the evolution of Google’s crawlers becoming more efficient, maintaining and submitting  a sitemap to Google is still the primary method for guiding Google’s crawl. A properly maintained sitemap can go a long way towards improving your site’s organic visibility.

Do I need a sitemap?

Though smaller websites with only a hundred pages may not need a sitemap, larger ecommerce sites absolutely need sitemaps. With the amount of links to be found on any one page (representing dozens of different directions for Google’s crawlers to go), it’s important to signal to Google which pages you would like their bots to crawl. Essentially, if there is a page you want indexed, if should be in your sitemap.

How do I generate a sitemap?

Most ecommerce platforms come with the ability to automatically generate your sitemap, but may require configuration. An automatically generated sitemap is preferred, and should be generated daily if possible. Daily sitemap generation ensures that any new products or categories on your site gets submitted to Google without any additional action, though you should manually submit any new URLs to Google that require more immediate crawling.

Sitemaps are limited to 50,000 URLs per sitemap file. If your site has more than 50,000 total URLs to submit, sitemaps need to be broken up into 50,000 URL segments. To submit your group of sitemaps, you can create a sitemap index, which lists your individual sitemap segments that comprise your whole set of URLs.

A list of sitemaps submitted to Google via a sitemap index

How can I make the most of tracking my sitemaps in Google?

If you can configure how to segment out your sitemaps to URLs of your choosing, consider grouping your URLs into groups that make indexation easier to track. Instead of one 50,000 URL sitemap, you could alternately submit 5 separate 10,000 URL sitemaps that are segmented in different categories, page types, or any delineation of your choosing. Segmenting your sitemaps into groups allows you to easily have an at a glance view of your indexation status of any group of pages. With a traditional sitemap configuration of all URLs in one group, it can be more difficult to notice potential indexation issues with certain types of pages.

What pages should I include in my sitemap?

Simply put, if you want a page indexed, include it in your sitemap. Conversely, it’s nearly equally important that pages you don’t want indexed are excluded from the sitemap. Including a page in your sitemap signals to Google that it’s a page you feel is important, so it sends a mixed signal if a page in your sitemap has a meta noindex tag. If Google begins wasting your crawl budget on pages that should not be crawled, it can come at the expense of your important pages not being crawled and Google crawling less pages overall. Thankfully, Google will report these pages with the error “Submitted URL marked ‘noindex’” on the Coverage screen in Google Search Console. If you have pages with this error, remove the noindex tags on pages you want crawled or remove the pages from your sitemap if they are truly noindexed paged.

Another common mistake with sitemap submission is the inclusion of non-canonical URLs in your sitemap. If your sitemap directs Google to one version of a page but the canonical tag on that page references another URL, it will be up to Google to decide which page to index, and may result in worse rankings for your site. In Search Console, excluded URLs like these are listed with the message “Duplicate, submitted URL not selected as canonical.” Pages like these should be investigated using Google’s URL Inspection Tool to identify Google’s selected canonical tag for the page. With Google’s selected canonical URL in hand, you can determine whether your issue is with the URL you submitted or the canonical tag on the page.

One final type of URL to exclude from your sitemap are URLs blocked from crawling by your robots.txt file. Google will report these pages as the error “Submitted URL blocked by robots.txt.” If you encounter these pages in your sitemap, first determine whether or not they should be blocked and then either remove them from your sitemap or update your robots.txt file to allow these pages to be crawled.

A sample sitemap entry

How important are the “lastmod,””changefreq,” and “priority” properties in my sitemaps?

In an XML sitemap, there are three properties listed for each page listed in your sitemap, each with a different intended purpose:

  • lastmod: This field notes the last time a page was modified. This field is usually updated automatically in your sitemap if you make changes to a page.
  • Changefreq: This field gives an estimate of how often the page content changes, and can include values ranging from “always,” “daily,” “yearly,” to “never.”
  • Priority: This field defines the relative importance of the page on a scale of 0.0 to 1.0, with 1.0 being the highest priority and 0.5 being the default value.

Thankfully, the only property here to worry about is lastmod, and that field usually updates automatically. While Change frequency and Priority were once used by crawlers, John Mueller of Google has now gone on the record that these fields no longer matter:

<blockquote class=”twitter-tweet” data-lang=”en”><p lang=”en” dir=”ltr”>The URL + last modification date is what we care about for websearch.</p>&mdash; 🍌 John 🍌 (@JohnMu) <a href=”https://twitter.com/JohnMu/status/898152437399396352?ref_src=twsrc%5Etfw”>August 17, 2017</a></blockquote>

<script async src=”https://platform.twitter.com/widgets.js” charset=”utf-8″></script>

Are There Any Other Uses For Sitemaps?

Sitemaps play a critical role in the migration of large sets of URLs (i.e., moving your site from http to https or migrating a subdomain). In these cases, submission of the URLs you would like to migrate after you have set up redirects is the most effective way to signal to Google to process your redirects (you should also submit your new URLs in a separate sitemap). Tracking the indexation status of these migration sitemaps will also make it easy to track Google’s progress of moving over the pages.

Conclusion

Maintaining a sitemap remains a must for proper crawling and indexation of your site, and should be among your top priorities when it comes to SEO. Though building a sitemap seems fairly straightforward in concept, the importance of clear and consistent signals between your site and your sitemap cannot be understated. Since your sitemap is a critical building block for SEO, even small optimizations and corrections can result in much better indexation, crawling, and ultimately, rankings.

About Chris Brown

Chris Brown has nearly 20 years of retail leadership while driving impressive results in a diverse range of retail business models including, pure play ecommerce, brick and mortar, omnichannel (with substantial mobile expertise) and merchandising. As Vice President of Omni-channel and eCommerce Strategy, Chris connects with clients to help drive their digital strategy, combining his experience in high-growth retail environments with software solutions to build revenue, increase conversion and drive retention. Connect with Chris on Linkedin: linked button

No Comments

Post A Comment