China AI visibility · How-to reference

Competitor AI visibility audit — how to benchmark your category on AI engines

A working method for auditing how your competitors show up inside ChatGPT, Claude, Perplexity, DeepSeek, Qwen and Doubao when consumers ask category-relevant questions. The four questions to answer, the DIY prompt-panel walkthrough, where the DIY method breaks at scale, and where Eastbound's free audit picks up the load.

Last reviewed 2026-05-10. Methodology grounded in published GEO measurement research; per-engine numbers from our own zh-CN consumer panels.

Why competitor benchmarking on AI engines is different from competitor SEO

Competitor SEO benchmarking is a solved problem. Pull a SERP for a target keyword, list the top 10 domains, look at their backlinks, anchor text, content depth and on-page signals. The output is a finite ranked list, the same for everyone running the same query in the same locale.

Competitor AI visibility benchmarking does not work like that. The output is a generated paragraph that may or may not name brands, may or may not cite sources, and varies run-to-run for the same prompt on the same engine. The two-stage citation framework from Zhang Kai & Yao Jingang's 2026 measurement paper (arXiv:2604.25707v1) — covered in our methodology — separates citation selection (does the engine pull the page into its source pool?) from citation absorption (does the page actually shape the answer?). A competitor can be selected often without being absorbed; absorbed without being mentioned by name; mentioned by name without being recommended.

That structural difference is why "look at the SERP" does not transfer. You are not auditing a ranked list. You are auditing how a generative engine assembles a category narrative — and which competitors are inside that narrative versus invisible to it. Aggarwal et al. (KDD 2024, n=10,000 queries) found the strongest single content tactic — adding authoritative third-party citations — lifted citation rate by +115%. The competitor question is therefore not "who ranks highest" but "whose third-party source-graph is dense enough that the engine keeps surfacing them."

The reframe. Competitor AI visibility is a brand-side question, not a SERP-side question. You are measuring whose mentions, citations and source-associations show up inside the answer — not whose URL ranks above whose.

The four questions a competitor AI visibility audit must answer

A useful audit answers four questions, in order. Skipping any of them produces output that looks rigorous but does not tell the brand what to do next.

1. Which competitors get cited at all?

The first cut is simple presence. For 20–40 category-relevant prompts, who shows up by name in the generated answer? Many brands assume their top-3 SEO competitors are also their top-3 AI competitors. They often are not. AI engines tend to surface brands with strong third-party source presence (Wikipedia, Reddit, vertical media, in China the Baike / Zhihu / Xiaohongshu / SMZDM stack), and that presence does not match SEO ranking 1:1.

2. In what context are they cited?

Presence is not the same as positioning. A competitor named alongside "premium", "trusted", or "category-defining" is in a different position from one named alongside "cheap alternative" or "budget option". The frame the engine assigns matters more than the count of mentions. Capture the surrounding sentence verbatim, not just the brand name.

3. What sources do the engines pull about each competitor?

This is the deepest question and the one most audits skip. Each engine self-attributes some sources when asked to ground its answer. Extract those sources per competitor and you have a map of where the engine "learned" about that brand. A competitor whose mention is anchored on Wikipedia, vertical-trade publications and major review sites is harder to displace than one whose mention is anchored on a single press release and a forum thread.

4. How does our brand compare on each of the above?

Only after the first three are answered can the brand-side gap analysis happen. The output is a per-engine, per-competitor table with three columns: mention rate, framing, source-graph density. Three numbers per cell, not one.

The DIY method — a manual prompt-panel walkthrough

You can do a usable first-pass audit in a single afternoon with three engines and a spreadsheet. Here is the working method.

Step 1: Define the category prompt set

Write 20–40 prompts a real consumer or buyer would ask their AI assistant when researching your category. Mix levels of specificity — broad ("best business hotel loyalty programme for Asia travel"), niche ("best hotel programme for status-matching with Marriott Bonvoy at Platinum tier"), and decision-stage ("compare IHG One Rewards versus Hilton Honors for free-night value"). Avoid leading prompts that name your brand — you want to see who the engine surfaces unprompted.

Step 2: Run each prompt across the engines you care about

For Western-facing brands: ChatGPT, Claude, Perplexity, Gemini, Google AI Overview. For Mainland-China-facing brands: DeepSeek, Qwen (Alibaba's Tongyi), Doubao (ByteDance), and ideally Yuanbao or Kimi as a secondary check. Each engine, each prompt, run twice on different days to get a feel for variance. Save the full text of each answer plus any cited sources.

Step 3: Extract three things per answer

Brand mentions. Every named brand in the answer, plus the sentence around it (positioning frame).
Cited sources. Every URL, publication name or platform the engine self-attributed.
Absences. The brands you expected to see that did not appear. The negative space is half the finding.

Step 4: Build the per-engine, per-competitor matrix

Rows: every competitor named at least twice in your run. Columns per engine: mention rate, framing summary (3–5 word descriptor), top-5 cited sources. Add a row for your own brand. The visible gap — which engines surface you, which do not, and which competitors anchor each engine's narrative — is the audit's headline finding.

Step 5: Re-run a subset for variance check

The same prompt can return materially different brand sets across runs. To distinguish signal from run-to-run noise, pick the five most important prompts and run them five times each. If a competitor shows up in 4 of 5 runs, that is a stable surface; if 1 of 5, treat as noise.

Where the DIY method breaks

The afternoon-with-a-spreadsheet method gets you a first impression. It does not get you a defensible benchmark you can take to a leadership review. Three things break at scale.

Sample size

Twenty to forty prompts run twice each is 40–80 data points per engine. That is enough to spot the largest gaps and nothing else. Reliable rank-ordering of mid-tier competitors needs a panel in the hundreds. Our own panels run 540 calls and up — see, for example, the source-overlap analysis in DeepSeek vs Qwen vs Doubao: Why Brands Look Different, where DeepSeek/Qwen/Doubao share only 20–30% of cited sources (top-15 Jaccard) on the same prompt set. With 40 prompts, you cannot detect that gap; the engines look more similar than they are.

Prompt-engineering bias

Prompts that you write yourself in your second language tend to under-represent how real consumers phrase the question. This matters less for English-language audits and very much more for zh-CN audits, where the natural-language phrasing patterns of Mainland Chinese consumers differ from translated English prompts. A prompt that reads correctly in English but back-translates from a Western frame will surface a different brand set than the consumer-voice equivalent.

No statistical reliability

A two-run-per-prompt panel does not give you reliability statistics — top-5 source-membership stability (κ_top-5), top-15 stability (κ_top-15), or run-to-run mention-rate ICC. Without those, you cannot tell whether a finding is robust or an artefact of one engine having a bad day. Our methodology page reports per-engine reliability (DeepSeek κ_top-15 = 0.89; Qwen 0.78; Doubao 0.46 on the granular long tail) precisely because the long-tail noise on Doubao is large enough to mislead anyone working with too few samples.

Net. The DIY method is good for a first impression and a leadership conversation about whether to invest further. It is not good enough to commit a six- or seven-figure brand-positioning decision against. If the gap matters operationally, run a panel that has reliability statistics attached.

Where the Eastbound free audit picks up

Eastbound's free audit applies the methodology described above on a stratified zh-CN consumer prompt panel for your URL plus the obvious competitor set, across DeepSeek, Qwen and Doubao. What changes versus the DIY pass:

Stratified panel design. L1 broad-category prompts plus L2 positioning-niche prompts, written in Mainland-Chinese consumer voice rather than translated English. Multi-turn for travel and hospitality categories where single-shot panels under-report the funnel.
Top-15 source extraction per engine. Not just the top-5 that everyone reports — the top-15 is where the long-tail differences actually live. Wikipedia surfaced in 21% of DeepSeek responses across our luxury-handbag panel (per Wikipedia in China AI visibility); the corresponding number on Qwen and Doubao is materially different, and the long tail diverges further still.
Per-engine reliability stats. κ_top-5 and κ_top-15 reported separately, plus run-to-run Pearson r on mention rates. We disclose where the noise is, not just the headline number.
Per-stage breakdown. Selection, absorption, mention reported separately rather than collapsed into a single visibility score, because the fix for each stage is different.

The free audit gives you the structure of the gap. If the structure is large enough that the brand wants to act on it, the paid measurement engagement runs the full panel (multi-week, hundreds of prompts, full reliability discipline) and produces the artefact you can take to a brand-positioning review.

What you actually do with the findings

A competitor AI visibility audit is not the deliverable. The deliverable is the next quarter's source-graph plan. Three patterns we see most often:

Pattern A: competitor anchored by sources you can also reach

The competitor's mention is anchored on Wikipedia, vertical-trade publications, and major review sites — surfaces your brand can also pursue. The plan is a multi-quarter source-graph build: pitched coverage, contributed analysis, encyclopedic-anchor work. Aggarwal et al.'s headline finding — third-party citation roughly 6.5× more effective than self-citation — is the structural reason this pattern is the most common winnable shape.

Pattern B: competitor anchored by sources you cannot reach quickly

The competitor's mention is anchored on academic citations, regulatory filings, or platform-specific community presence (Reddit threads, Zhihu Q&A, Xiaohongshu posts) that take time to seed. The plan is split: short-cycle wins on the surfaces you can reach, plus a 2–4-quarter community-seeding programme on the slower surfaces.

Pattern C: white space — competitors absent from these engines entirely

None of the named competitors are surfaced reliably. The category is currently a land-grab window. The plan is fast-cycle: an English-language reference page, a primary research piece with a quotable headline number, and a small set of placement pitches before the window closes. This is the highest-ROI pattern and the rarest.

Limits of the methodology — what it does not measure

For completeness, the things a competitor AI visibility audit deliberately does not measure:

Sales attribution. The audit measures selection, absorption and mention. It does not measure whether the mention drives a click, a consideration-set entry, or a transaction. AI mention is a brand-visibility signal, not a sales-attribution signal. We measure surface, not epistemic reliance and not revenue.
Training-corpus composition. We can only observe what each engine self-attributes when answering. Self-attribution is not the same as the underlying training data. Claims of the form "AI trusts X" or "X was absorbed during training" are not what the methodology measures.
Brand-recall persistence over time. The audit is a snapshot. Whether a mention pattern observed in May still holds in November depends on engine source-pool drift, which we measure with periodic re-runs but cannot predict from one panel alone.
Western-engine measurement at the same depth. Eastbound's research and consultancy work focuses on the Mainland-China surface (DeepSeek, Qwen, Doubao). The same methodology can be applied to Western engines, but our panels and reliability stats are anchored on the Chinese surface.

Run the free competitor audit

The free Eastbound audit runs your URL plus the competitor set against a stratified zh-CN consumer prompt panel across DeepSeek, Qwen and Doubao, and reports per-engine selection, absorption and brand-mention scores with top-15 source extraction and reliability stats. No login.

Run AI visibility audit or book a 30-minute fit check.