Sitemap.xml is the site map you explicitly hand to search engines, saying: "these are my important URLs, index them first". Unlike robots.txt, which tells what not to do, the sitemap tells what to do. Engines aren't obliged to follow it literally β it's a recommendation, not a command. But a well-tuned sitemap accelerates new-page indexing by 30β60% and helps the engine prioritise correctly across a big site. This article walks through how to validate a sitemap for free with our online SEO tool, and the pitfalls awaiting anyone who configures it sloppily.
The baseline check via the validator
Open /tools/sitemap-validator and paste your sitemap URL (usually domain.com/sitemap.xml). The tool runs three checks: one β XML schema compliance (no broken tags, correct element nesting); two β URL reachability (do they all return 200, or are there 404s and redirects); three β optional field correctness (lastmod in W3C datetime, priority within 0.0β1.0). Output: a report highlighting problematic URLs.
lastmod: the chief sitemap pain
The <lastmod> tag should reflect the actual content change date of the page. Most sites stuff in the sitemap generation date β categorically wrong. If all 10,000 URLs have lastmod = 2026-06-07, Google sees this and ignores the field: "they don't actually know when things change". Correct: per-URL real change date. For blog posts β last edit date. For products β last card update. For static pages β creation date. If your CMS can't supply this, omit lastmod entirely rather than fake it.
When to split into a sitemap index
The spec limits a single sitemap file to 50,000 URLs and 50 MB (uncompressed). If your site is smaller, one file is fine. Bigger β a sitemap index with several children. Splitting should follow site structure: sitemap-blog.xml for blog posts, sitemap-tools.xml for tools, sitemap-vs.xml for comparison pages. Two benefits: one β Search Console shows per-section indexation stats ("blog indexed 95%, products 60% β that's where the problem is"); two β the engine spots updates in one section faster, without re-reading the whole sitemap.
hreflang in sitemap for bilingual sites
For a bilingual site (e.g., /ru and /en versions), hreflang is mandatory markup for Google. One way is to declare it directly in the sitemap via xhtml:link rel="alternate". This is easier than wiring it into every page's head, especially for large sites. Example: for /ru/blog/post-1 you add <xhtml:link rel="alternate" hreflang="en" href="https://domain.com/en/blog/post-1" />. Our validator checks: hreflangs present for all language variants, valid codes (ru, en, en-US, x-default), alternate URLs reachable.
What should never end up in the sitemap
- URLs with query parameters (?utm_source=...) β duplicates from a search perspective.
- URLs flagged noindex β if a page shouldn't be indexed, it shouldn't live in the map.
- URLs returning 404 and 410 β dead pages in a sitemap annoy the engine.
- URLs that redirect β list the final URL, not the intermediate.
- Duplicates β the same URL listed in multiple child sitemaps.
- URLs whose canonical points to a different page β no point indexing the duplicate.
submitted-to-indexed ratio
One of the most useful sitemap-quality metrics is the "submitted vs indexed" ratio in Google Search Console. 10,000 URLs submitted, only 4,000 indexed β bad. Healthy = 80%+. A low ratio means one of three things: you're pushing URLs Google doesn't want to index (thin content, duplicates); you have technical issues (slow server, 5xx errors during crawl); pages are poorly linked internally (orphan pages). Site Metrics Tool tracks this ratio automatically and alerts if it drops by more than 10 percentage points in a week.
Sitemap and rank tracking
When you use Site Metrics Tool for rank tracking, we automatically pull your sitemap daily and compare it against the URLs actually surfaced in the SERP for your keywords. This produces the critical signal "you want to rank /products/special, but Google shows /blog/special" β i.e., cannibalisation between two pages targeting one intent. Without sitemap integration this is a manual hours-long Search Console hunt.
Frequently asked
Do I need to submit sitemap to Search Console manually each time?
Once is enough. After the initial submission Google rechecks on its own schedule (usually daily). The Sitemap: directive in robots.txt also automates discovery.
Can I use .xml.gz?
Yes, engines support gzip-compressed sitemaps. It shrinks file size β useful for big sites. The 50 MB limit applies to the uncompressed size, not gzip.
Are images and videos needed in sitemap?
Optional. Image sitemaps help engines find images, especially lazy-loaded ones invisible at first HTML render. Video sitemaps suit sites heavy on video. For an average blog β usually not needed.
What if my sitemap exceeds 50 MB?
Split into multiple children and build a sitemap index. One index can reference 50,000 child sitemaps β effectively limitless for any realistic site size.