Duplicate pages are URLs that serve identical or near-identical content. Engines burn crawl budget on them and only keep one version in the index β usually not the one you want. On large sites with 5,000+ URLs, duplicates often account for 30β50% of all URLs, and until you clean them up no amount of content optimisation will help. This article walks through the five main duplicate types, how to find them via free online SEO tools and Google Search Console, and which removal method fits each case.
Type 1: URL parameter duplicates
The most common case. The product page /product/iphone-15 shows identical content at /product/iphone-15?utm_source=email, /product/iphone-15?ref=instagram, /product/iphone-15?color=red (if you have a colour filter that doesn't change the URL fully). The engine sees these as 4 different pages with the same content. Fix: rel=canonical in head pointing to the parameter-free URL. In GSC's "URL Parameters" section (if still present in your region) you can also set behaviour per parameter.
Type 2: www vs non-www, http vs https
A site can be accessible at 4 versions simultaneously: http://domain.com, http://www.domain.com, https://domain.com, https://www.domain.com. If all 4 return 200 content, you have 4 duplicates of the homepage (and every page). Fix: one version is canonical, the other three return 301 to it. Configured at the nginx level. Then in Google Search Console and Yandex Webmaster you choose the canonical version as the "primary mirror". Without this, Google can index the wrong version.
Type 3: trailing slash and case
/about and /about/ are two different URLs to engines. If both return 200, you have a duplicate. Fix: one canonical version, the other 301s to it. One nginx rule handles it. Case matters too: /About vs /about are different URLs. Apache often runs case-insensitive, but Linux nginx doesn't. Best option: 301 redirect any uppercase letters in URLs to lowercase.
Type 4: pagination and filters
Pagination pages /category?page=2, /category?page=3 aren't duplicates in the strict sense (different content) but often index instead of the "right" first page. Fix: all pagination pages canonical to /category (no page param). Filters are trickier: /shoes?color=red is genuinely different content from /shoes. Depends on strategy: if you want to rank for "buy red sneakers", keep the page indexable with a self-canonical; if not, noindex it.
Type 5: content duplicates (not URL)
The trickiest type. URLs differ, content overlaps heavily. Example: you wrote 3 articles about rank monitoring, each for its own keyword, but 70% of content overlaps. Engines treat them as duplicates and rank only one. Fix depends on situation: if articles truly duplicate β merge into one strong page via 301 (overlaps with the cannibalization topic); if they should remain distinct β rewrite each so overlap drops to 20%; unique examples, different angles, different FAQ.
How to find duplicates
Fastest method β Google Search Console β Indexing β Pages β "Duplicate without user-selected canonical". Google literally tells you "these pages I consider duplicates and they don't declare canonical". Also: a Screaming Frog crawl with Spider β Duplicates surfaces pages with identical titles, meta descriptions or content. Third option β our free Sitemap validator (`/tools/sitemap-validator`): it compares URLs in sitemap to actually-served content and highlights dupes. Sites with 1,000+ pages benefit from a specialised audit crawling every URL and comparing content hashes.
When to use each removal method
- 301 redirect β when the duplicate isn't needed anymore and all its link equity should pass to the main version. Strongest method.
- rel=canonical β when the duplicate is needed for UX (e.g., a filtered category) but shouldn't index separately. Passes most link equity.
- noindex β when the duplicate must stay accessible but be fully excluded from the index. No link equity passes.
- 410 Gone β when the page shouldn't exist at all. A hard signal to the engine.
- robots.txt Disallow β NOT for duplicate removal. Blocks crawling, not indexing.
Frequently asked
What about thousands of duplicates at once?
Sort by priority: first the top-100 traffic pages and their duplicates, then commercial categories, then the rest. Cleaning a large site fully takes 2β3 months β normal.
Can duplicates trigger a penalty?
No direct penalty. But indirectly: crawl budget wasted, link equity diluted, the engine surfaces "the wrong" page. The effect mimics a penalty: traffic drops, positions stagnate.
Does Site Metrics Tool track duplicates?
Indirectly. We track `found_url` (page actually shown in SERP) and `target_url` (page you wanted to rank). Mismatch = cannibalization/duplicate signal, flagged in the dashboard.