Duplicate pages: how to find and fix them in 30 minutes

Duplicate pages are URLs that serve identical or near-identical content. Engines burn crawl budget on them and only keep one version in the index — usually not the one you want. On large sites with 5,000+ URLs, duplicates often account for 30–50% of all URLs, and until you clean them up no amount of content optimisation will help. This article walks through the five main duplicate types, how to find them via free online SEO tools and Google Search Console, and which removal method fits each case.

Type 1: URL parameter duplicates

The most common case. The product page /product/iphone-15 shows identical content at /product/iphone-15?utm_source=email, /product/iphone-15?ref=instagram, /product/iphone-15?color=red (if you have a colour filter that doesn't change the URL fully). The engine sees these as 4 different pages with the same content. Fix: rel=canonical in head pointing to the parameter-free URL. In GSC's "URL Parameters" section (if still present in your region) you can also set behaviour per parameter.

Type 2: www vs non-www, http vs https

A site can be accessible at 4 versions simultaneously: http://domain.com, http://www.domain.com, https://domain.com, https://www.domain.com. If all 4 return 200 content, you have 4 duplicates of the homepage (and every page). Fix: one version is canonical, the other three return 301 to it. Configured at the nginx level. Then in Google Search Console and Yandex Webmaster you choose the canonical version as the "primary mirror". Without this, Google can index the wrong version.

Type 3: trailing slash and case

/about and /about/ are two different URLs to engines. If both return 200, you have a duplicate. Fix: one canonical version, the other 301s to it. One nginx rule handles it. Case matters too: /About vs /about are different URLs. Apache often runs case-insensitive, but Linux nginx doesn't. Best option: 301 redirect any uppercase letters in URLs to lowercase.

Type 4: pagination and filters

Pagination pages /category?page=2, /category?page=3 aren't duplicates in the strict sense (different content) but often index instead of the "right" first page. Fix: all pagination pages canonical to /category (no page param). Filters are trickier: /shoes?color=red is genuinely different content from /shoes. Depends on strategy: if you want to rank for "buy red sneakers", keep the page indexable with a self-canonical; if not, noindex it.

Type 5: content duplicates (not URL)

The trickiest type. URLs differ, content overlaps heavily. Example: you wrote 3 articles about rank monitoring, each for its own keyword, but 70% of content overlaps. Engines treat them as duplicates and rank only one. Fix depends on situation: if articles truly duplicate — merge into one strong page via 301 (overlaps with the cannibalization topic); if they should remain distinct — rewrite each so overlap drops to 20%; unique examples, different angles, different FAQ.

How to find duplicates

Fastest method — Google Search Console → Indexing → Pages → "Duplicate without user-selected canonical". Google literally tells you "these pages I consider duplicates and they don't declare canonical". Also: a Screaming Frog crawl with Spider → Duplicates surfaces pages with identical titles, meta descriptions or content. Third option — our free Sitemap validator (`/tools/sitemap-validator`): it compares URLs in sitemap to actually-served content and highlights dupes. Sites with 1,000+ pages benefit from a specialised audit crawling every URL and comparing content hashes.

When to use each removal method

301 redirect — when the duplicate isn't needed anymore and all its link equity should pass to the main version. Strongest method.
rel=canonical — when the duplicate is needed for UX (e.g., a filtered category) but shouldn't index separately. Passes most link equity.
noindex — when the duplicate must stay accessible but be fully excluded from the index. No link equity passes.
410 Gone — when the page shouldn't exist at all. A hard signal to the engine.
robots.txt Disallow — NOT for duplicate removal. Blocks crawling, not indexing.

Frequently asked

What about thousands of duplicates at once?

Sort by priority: first the top-100 traffic pages and their duplicates, then commercial categories, then the rest. Cleaning a large site fully takes 2–3 months — normal.

Can duplicates trigger a penalty?

No direct penalty. But indirectly: crawl budget wasted, link equity diluted, the engine surfaces "the wrong" page. The effect mimics a penalty: traffic drops, positions stagnate.

Does Site Metrics Tool track duplicates?

Indirectly. We track `found_url` (page actually shown in SERP) and `target_url` (page you wanted to rank). Mismatch = cannibalization/duplicate signal, flagged in the dashboard.

🧱

Sep 4, 2026 · 14 min read

Schema.org markup: the advanced 2026 guide

A complete guide to structured data: every type for every page, JSON-LD vs Microdata, @id linking, testing and debugging.

🌐

Aug 19, 2026 · 13 min read

Hreflang for multilingual sites: the complete 2026 guide

What hreflang is, how to set it up correctly for two or more languages, typical mistakes, and why Google and Yandex treat hreflang differently.

⚙️

Aug 3, 2026 · 13 min read

JavaScript SEO in 2026: SPA, hydration, and why your page won't index

Why JavaScript-heavy sites index poorly, how Google and Yandex render JS, SSR/SSG/CSR differences, and a fix checklist.

🦈

Jul 14, 2026 · 13 min read

Keyword cannibalization: how to find and fix it

What keyword cannibalization is, why it kills rankings, how to detect it via GSC and Webmaster, and three proven strategies to fix it.

Type 1: URL parameter duplicates

Type 2: www vs non-www, http vs https

Type 3: trailing slash and case

Type 4: pagination and filters

Type 5: content duplicates (not URL)

How to find duplicates

When to use each removal method

301 redirect — when the duplicate isn't needed anymore and all its link equity should pass to the main version. Strongest method.
rel=canonical — when the duplicate is needed for UX (e.g., a filtered category) but shouldn't index separately. Passes most link equity.
noindex — when the duplicate must stay accessible but be fully excluded from the index. No link equity passes.
410 Gone — when the page shouldn't exist at all. A hard signal to the engine.
robots.txt Disallow — NOT for duplicate removal. Blocks crawling, not indexing.

Frequently asked

What about thousands of duplicates at once?

Sort by priority: first the top-100 traffic pages and their duplicates, then commercial categories, then the rest. Cleaning a large site fully takes 2–3 months — normal.

Can duplicates trigger a penalty?

No direct penalty. But indirectly: crawl budget wasted, link equity diluted, the engine surfaces "the wrong" page. The effect mimics a penalty: traffic drops, positions stagnate.

Does Site Metrics Tool track duplicates?

Indirectly. We track `found_url` (page actually shown in SERP) and `target_url` (page you wanted to rank). Mismatch = cannibalization/duplicate signal, flagged in the dashboard.

🧱

Sep 4, 2026 · 14 min read

Schema.org markup: the advanced 2026 guide

A complete guide to structured data: every type for every page, JSON-LD vs Microdata, @id linking, testing and debugging.

🌐

Aug 19, 2026 · 13 min read

Hreflang for multilingual sites: the complete 2026 guide

What hreflang is, how to set it up correctly for two or more languages, typical mistakes, and why Google and Yandex treat hreflang differently.

⚙️

Aug 3, 2026 · 13 min read

JavaScript SEO in 2026: SPA, hydration, and why your page won't index

Why JavaScript-heavy sites index poorly, how Google and Yandex render JS, SSR/SSG/CSR differences, and a fix checklist.

🦈

Jul 14, 2026 · 13 min read

Keyword cannibalization: how to find and fix it

What keyword cannibalization is, why it kills rankings, how to detect it via GSC and Webmaster, and three proven strategies to fix it.

Duplicate pages: how to find and fix them in 30 minutes

Type 1: URL parameter duplicates

Type 2: www vs non-www, http vs https

Type 3: trailing slash and case

Type 5: content duplicates (not URL)

How to find duplicates

When to use each removal method

Frequently asked

Related articles

Schema.org markup: the advanced 2026 guide

Hreflang for multilingual sites: the complete 2026 guide

JavaScript SEO in 2026: SPA, hydration, and why your page won't index

Keyword cannibalization: how to find and fix it

Duplicate pages: how to find and fix them in 30 minutes

Type 1: URL parameter duplicates

Type 2: www vs non-www, http vs https

Type 3: trailing slash and case

Type 5: content duplicates (not URL)

How to find duplicates

When to use each removal method

Frequently asked

Related articles

Schema.org markup: the advanced 2026 guide

Hreflang for multilingual sites: the complete 2026 guide

JavaScript SEO in 2026: SPA, hydration, and why your page won't index

Keyword cannibalization: how to find and fix it