A keyword list is the ordered set of queries you want to rank for. Its quality drives everything in SEO: which pages to create, which titles to write, which intents to optimise for, which compete with each other, which convert. Most teams "wing it" — open Wordstat, copy the first 50 phrases, call it done. A year later they find 80% of those keywords drive no clicks, and the most valuable commercial intent ranks an internal page instead of the landing. This article walks through the right methodology for keyword research, with a focus on Google and Yandex SEO in parallel.
Query sources
A healthy keyword set draws from 5–7 different sources, because each surfaces a different slice. First — your own GSC and Yandex Webmaster: queries you're already shown for (this is "real demand" already driving impressions). Second — Yandex Wordstat: initial frequency sweep. Third — Google Keyword Planner: monthly search volume and Google Ads competition. Fourth — competitor analysis: take 5–7 top sites in the niche and pull their keyword set via third-party tools. Fifth — search suggestions (suggest): open Google or Yandex, start typing and collect 4–6 letter completions. Sixth — Google's "People Also Ask" and analogues. Seventh — internal site-search stats (if you have a search bar, GA's Site Search section shows what users actually search for).
Initial cleaning and filtering
After collection you have 3,000–10,000 phrases. Half is junk. Strip out: geo-bound queries for cities you don't serve (if you're Moscow-only, drop "buy in Krasnodar"); competitor brand queries you don't want to rank for (unless legitimately for "X vs Y"); obviously irrelevant intents (info queries on a commercial page or vice versa); duplicates by stem. Minimum impression threshold — 10/month in Wordstat; anything lower is noise and tracking it doesn't pay back. You should end up with 200–800 phrases, depending on site size.
Clustering by intent
A cluster is a set of queries for which Google or Yandex serves the same pages in the top 10. If "buy minecraft" and "minecraft licence price" share 7 of 10 top sites, that's one cluster — one page should target it. If the top-10 lists are different, that's two clusters and two pages. This works better than clustering by semantic similarity because it uses actual ranking signals from the engines. Mechanically: pull top-10 URLs for each phrase, compute pairwise intersection, glue phrases whose overlap > 30%. Automate via a custom script or specialised SEO tools.
Mapping queries to pages
Every cluster gets one — and only one — target page on the site. This rule gets ignored constantly: teams create 4 different pages for similar queries, and they cannibalise each other — Google can't tell which to surface and none ranks well. In the Site Metrics Tool keyword table the target_url column shows which page you intend to rank. After GSC sync we also show found_url — the page that actually appeared in the SERP. If target_url ≠ found_url, you have cannibalisation: either collapse the two pages into one (301 from weaker to stronger), or sharply diverge their intents in copy.
Priority allocation
Not all queries are equal. Score the keyword set on two axes: search volume (low/mid/high) and intent (informational/navigational/commercial/transactional). High-volume, low-competition transactional queries are gold — they get the most page-optimisation and link-building budget. High-volume informational ones power the blog and top-of-funnel content. Long-tail transactional queries are usually easy to rank and convert immediately — a great place to start. In Site Metrics Tool, use tags: tag1=intent, tag2=priority, tag3=funnel-stage — the keyword filters will slice neatly along these axes.
Keeping the keyword set current
A keyword set is a living organism. Quarterly, run an "inventory": prune queries that produced no impressions for a year; add new ones surfacing in GSC and Webmaster on their own; re-check clustering (Google periodically shifts which pages it serves — last year's single cluster may have split). In parallel, track new keyword emergence in the industry: competitor product launches, trend shifts, seasonality. Don't believe "build it once, ride it five years" — half your keywords will be irrelevant in 12 months.
Integration with the rank tracker
Once the keyword set is built and clustered, load it into a rank tracker. Site Metrics Tool supports bulk import: a CSV with keyword,target_url,tags,refresh_interval columns loads in one click. From here, Google and Yandex monitoring runs in parallel: positions get pulled every 6–24 hours (depending on refresh_interval), history accumulates, and the dashboard shows at a glance which clusters grow, which drop, and where cannibalisation appears. Without a tracker, your keyword set lives in Excel and goes stale in 3 months.
Frequently asked
How many keywords does a typical site need?
For a blog or content site — 300–800 phrases actively monitored. For e-commerce with 1000+ SKUs — 1,000–3,000. For a landing page or SaaS — 50–150 (narrow but deep). The rule: each keyword needs a defined target page. If it doesn't have one, don't track it.
How often to refresh the keyword set?
Minor edits (add 5–10 new keys from GSC) — weekly. Full revision — quarterly. Radical overhaul — yearly, or after a major algorithm update.
Can I use one keyword set for both Google and Yandex?
Base set — yes, overlap is heavy. But add 10–15% Yandex-specific queries (regionality, language quirks) and cluster them separately — Yandex clusters often differ from Google's. In Site Metrics Tool one keyword table covers both engines; the source filter splits the data.
What to do with queries that "won't grow"?
If a position has parked at 30–80 for 6 months — either deeply rework the page (content, relevance, UX) or drop the keyword from monitoring. Wasting tracker slots on "stuck" phrases is just paying rent. If the page has backlinks, try link building.