ShelfSight · AI Visibility

See your store the way AI shoppers see it

ChatGPT, Perplexity, Gemini, Claude and Google AI Overviews now answer “what should I buy?” Being available to them is not the same as being recommended. Enter your store to get an honest read — in stability counts, never invented numbers.

Runs a free audit of your public product feed against what AI answers rely on. No account, no card.

A sample weekly report

illustrative data — Aurora Coffee Gear

Your week at a glance

Across your tracked buying-intent prompts, AI engines recommended you in 3 of the last 5 decisive scans — up from 2 the prior week. Two prompts you previously lost are now contested.

What changed

  • Engines began naming you for “best home espresso machine under $500” — recommended in 3 of 5 scans, up from 0.
  • You are still absent on “quiet coffee grinder for small kitchens”, where a competitor is recommended in 4 of 5 scans.

Do these 3 things

  1. Close the grinder gap. Engines cite a handful of review sites when answering that prompt; getting your grinder reviewed there is the most direct route in.
  2. Fix 2 feed issues. Two products are missing a valid GTIN — AI shopping feeds can reject them. Each is a one-click fix.
  3. Add the comparison your buyers ask. Track a head-to-head prompt your catalog can win.

_Every number above is a measured stability count — recommended in N of the last M decisive scans — never an invented percentage. “Unsure” answers are excluded, never counted as absence._

How we measure it

the methodology is the product

How ShelfSight Measures AI Visibility

Every claim on this page maps to shipped, tested code. When the methodology changes, this page changes in the same release — methodology changes never silently rewrite history.

What we measure

ShelfSight asks the same buying-intent questions your customers ask — across ChatGPT (OpenAI), Perplexity, Gemini, Claude, and Google AI Overviews — and records whether the answers mention or cite you or named competitors.

How a "mention" is decided

Deterministic rules, strongest signal first:

  1. Citation domain. An answer cites a URL on your domain → counted as mentioned. This is the strongest signal and is never overridden.
  2. Name match. Your brand (or a configured alias) appears as a whole word/phrase in the answer → mentioned. If your brand name is also an everyday word (e.g. a brand called "On"), a bare text match is never counted as a mention without citation corroboration — it's marked unsure instead.
  3. Product-title overlap. Distinctive words from your product titles appearing together are at most unsure, never a confirmed mention.

Anything that doesn't clear these bars is not mentioned. We do not use fuzzy guessing; "smart wool" does not match "Smartwool" unless you configure it as an alias.

Why "unsure" exists — and why it never counts against you

AI answers are messy. When we can't decide honestly, we say so: unsure results are excluded from your score's denominator — they are never silently treated as absence (or presence). An engine that returns an empty answer is recorded as unsure, because no answer is not evidence you're invisible.

"Referenced" is a weaker claim than "mentioned"

The per-product view on your dashboard shows where engine answers reference individual products from your catalog. A reference means distinctive words from a product's title appeared together in an answer — the same conservative rule 3 above. It is token-level evidence, deliberately labeled referenced: weaker than a confirmed mention, and never a recommendation count. We show it because knowing which products surface (and which never do) is actionable; we label it honestly because the evidence is circumstantial.

Stability, not fake precision

A single AI answer is one sample from a distribution — engines vary run to run. So we never report "your visibility is 73.4%". We report stability: "recommended in 3 of your last 5 scans."

  • Your first scan runs a 3-sample burst per prompt for an instant stability estimate.
  • Ongoing scheduled scans are single runs; stability is computed over the rolling window of your recent scans — a time series is repeated sampling.
  • When there's no decisive data, your dashboard says "no decisive scans yet" — never a made-up zero.
  • Every result is stored with the engine's model version and timestamp. When engines change models, your history stays exactly as it was measured.

How the trend line works

The 14-day trend shows, per day, the share of decisive answers that named you — unsure answers are excluded from the denominator, exactly as in your score. Days with no decisive scans show a gap, not a zero. The week-over-week change compares the last 7 days against the 7 before; if either window has no decisive data, we show no comparison rather than a misleading one.

An honest caveat about APIs

We query engines through their public APIs with web search/grounding enabled. API responses are a statistical proxy for what consumers see in the apps — close, current, and reproducible, but not pixel-identical. This is standard practice across the AI-visibility industry; we state it plainly because the alternative (pretending otherwise) would be dishonest.

The feed audit has no proxy problem: it deterministically checks your actual product data against the published requirements of the Agentic Commerce Protocol (the feed spec AI shopping surfaces ingest). Every finding shows the exact rule and the exact fix.

What your reports will never do

The weekly action report is written by an LLM under a hard contract: it may only restate the stability counts, product references, and audit findings above, may never invent numbers or causes, and must say "no decisive data yet" when that's the truth. The contract is enforced in the prompt and covered by tests.