How to Measure AI Search Visibility: Metrics, Tools and Method (GEO)

By Morris McLane 9 June 2026 5 min read

An abstract field of glowing blue data points on black, evoking the AI answers that GEO is measured by.

You cannot measure Generative Engine Optimisation the way you measure SEO. There is no ranking to check and no position to report. AI engines compose answers from many sources at once and publish no league table. So you measure GEO by observation: you watch what the engines actually say about you, on the questions that matter, and you track how it changes against a baseline. Here is how to do that in practice.

This is the measurement layer beneath generative engine optimisation as a whole. If GEO is the work, this is how you tell whether it is working.

Why measuring GEO is different from SEO

SEO gives you a number. You can see where a page ranks for a query and watch the position move week to week. GEO gives you no such number. When someone asks ChatGPT, Gemini, Perplexity or Google’s AI Overviews about your category, the engine synthesises an answer and may cite a handful of sources, but it does not rank you against anyone, and the same question can return a different answer tomorrow.

So measurement shifts from position to presence and accuracy. The question is no longer “where do I rank?” but “when the engine answers, does it mention me, is what it says correct, and where did it get it?” You measure by sampling the answers themselves, repeatedly, and comparing each cycle to a baseline.

The four things to track

A workable GEO measurement programme watches four things over time.

Presence. For each priority question, are you mentioned or cited at all, and by which engines? Presence is the foundation: you cannot be chosen if you are never in the answer.
Accuracy. When you are mentioned, is what the engine says correct, current and on message, or is it outdated, vague, or conflated with a competitor? For a regulated, listed or contested business, an inaccurate but confident AI answer is an exposure in its own right, so accuracy is tracked as carefully as presence.
Source coverage. Which pages and third-party sources did the engine draw from to build the answer, and are they yours or someone else’s? This tells you where the model’s view of you actually comes from, and therefore where to do the work.
Share of answer. Across your priority questions, how often do you appear versus the competitors in your category? This is the closest thing GEO has to a ranking: a comparative measure of who owns the answer space.

How to build your baseline

You cannot improve a picture you have not captured. A baseline has three parts.

1. The question set. Write down the questions your buyers, candidates, journalists and stakeholders actually ask an AI engine about your category, your competitors and you. Keep it fixed: the same questions every cycle, so the trend is comparable. Twenty to fifty well-chosen questions is usually enough to be representative without becoming unmanageable.

2. The engines. Run the set through the engines your audience uses: ChatGPT, Google’s AI Overviews, Gemini, Perplexity and Copilot. Weight your effort towards the ones that matter most to your buyers rather than spreading evenly across all of them.

3. The cadence. Pick a regular rhythm, typically monthly, with closer attention around model updates, product launches, hiring drives or contested moments. Capture the full answer and its citations each time, not just a yes or no, so you can see how the engine talks about you, not only whether it does.

That first capture is your baseline. Everything after is measured against it.

Doing it: manual or with tools

You can start manually. Run your priority questions through each engine, paste the answers and citations into a spreadsheet, and tag each one for presence, accuracy and the sources used. That alone will tell you more than most organisations know about their AI visibility, and it is enough to establish a baseline and prove movement.

Dedicated AI-visibility tools make the same work repeatable at scale: they run large prompt sets across engines on a schedule and track citations, sentiment and share of voice over time. They are worth it once the question set grows or the cadence tightens. But the tool is not the method. The discipline, the same questions, the same engines, the same cadence, measured against a baseline, is what produces a number you can actually move.

Common measurement mistakes

Measuring once. A single snapshot ages within weeks as models retrain. Without a standing cadence, you are guessing.
Changing the questions each time. If the question set drifts, the trend is meaningless. Fix it, then only evolve it deliberately.
Counting mentions but ignoring accuracy. Being mentioned wrongly is not a win. Presence without accuracy can be a liability.
Tracking only your own engine of choice. Your audience is spread across several. Measure where they ask, not where you find it convenient.
Reporting a vanity number. “We appear in ChatGPT” is not a measure. Presence, accuracy, source coverage and share of answer, against a baseline, are.

How Morris McLane measures this

For us, measurement is the start of every GEO engagement, not an afterthought. As part of our AI search visibility work, we run a fixed set of your priority questions through ChatGPT, Google’s AI Overviews, Gemini, Perplexity and Copilot on a regular cadence, capture the answers and citations, and baseline where you appear, where you are missing, and where the engines are simply wrong about you. You can see this on your own organisation with a free AI visibility audit: we run the baseline and send you the numbers, at no cost.

From there the metrics drive the work. Gaps in presence point to content and entity work; accuracy problems point to the source-layer corrections that fix what the engines read; thin source coverage points to where to build corroboration. Then we re-measure against the baseline, so progress is visible and drift gets caught before it erodes. The same loop runs continuously, because the answers never stop moving.

The short version

GEO has no ranking, so you measure it by watching the answers. Fix a set of the questions that matter, run them through the engines your audience uses on a regular cadence, and track four things against a baseline: presence, accuracy, source coverage and share of answer. Do that, and “are we visible in AI search?” stops being a guess and becomes a number you can move.

Frequently asked questions

How do you measure GEO when the engines don't publish rankings?

By observation rather than ranking. You take a fixed set of the questions your audience actually asks, run them through each AI engine on a regular cadence, and record what comes back: whether you are mentioned, whether it is accurate, and which sources the engine drew from. Tracked against a baseline over time, that gives you a moveable measure even though no engine publishes a position.

What metrics matter for AI search visibility?

Four. Presence: are you mentioned or cited at all, and on which engines. Accuracy: is what the engine says correct and current. Source coverage: which pages and third-party sources the answer was built from, and are they yours. Share of answer: how often you appear versus competitors across your priority questions.

How often should you measure your AI visibility?

Regularly enough to catch drift before it entrenches, which usually means a monthly cadence with closer attention around model updates, launches or contested moments. Answers shift as models retrain and competitors publish, so a single snapshot ages quickly.

Can you measure GEO manually, or do you need tools?

You can start manually by running your priority questions through each engine and logging the answers in a spreadsheet. That is enough to baseline. Dedicated AI-visibility tools make it repeatable at scale, tracking citations, sentiment and share of voice across engines over time, but the discipline matters more than the tooling.

How do you measure share of voice in AI answers?

Across your set of priority questions, count how often each competitor, including you, is mentioned or cited, and express your appearances as a share of the total. Run it on the same question set each cycle so the trend is comparable.

How do you know whether your GEO work is improving things?

Compare each cycle against your baseline on the same questions. Improvement shows as rising presence, more accurate mentions, more of the cited sources being yours, and a growing share of answer versus competitors. If those move, the work is landing; if they drift, you catch it early.

Related service AI Search Visibility Explore

How to Measure AI Search Visibility: Metrics, Tools and Method (GEO)

Why measuring GEO is different from SEO

The four things to track

How to build your baseline

Doing it: manual or with tools

Common measurement mistakes

How Morris McLane measures this

The short version

Frequently asked questions

More in AI Search Visibility

Your website is being graded for AI agents, not just humans

ChatGPT Ads: what the early data actually shows

What is Generative Engine Optimisation (GEO)?

Why measuring GEO is different from SEO

The four things to track

How to build your baseline

Doing it: manual or with tools

Common measurement mistakes

How Morris McLane measures this

The short version

Frequently asked questions

More in AI Search Visibility

Your website is being graded for AI agents, not just humans

ChatGPT Ads: what the early data actually shows

What is Generative Engine Optimisation (GEO)?

A short note isenough to start.

A short note is
enough to start.