# How Eastbound Measures China AI Visibility — Methodology

> Two-stage citation framework, stratified zh-CN prompt panels, per-engine reliability stats, and the measured / hypothesis / intervention-test labelling system.

Published: 2026-05-05
Site: https://www.eastbound.ai/methodology/

## Two-stage citation framework

Grounded in Zhang Kai & Yao Jingang's 2026 GEO measurement paper (arXiv:2604.25707v1):

1. **Citation selection** — does your domain enter the engine's source pool?
2. **Citation absorption** — does the page actually shape the answer language?
3. **User-visible mention** (third stage, downstream of both) — does the answer name your brand and how prominently?

A page can be selected often but absorbed weakly. A brand can be absorbed but mentioned only long-tail. We report each stage separately because the fix differs.

## Prompt-panel design

Stratified zh-CN consumer-voice prompts at two levels:

- **L1 — broad category.** "best moisturiser for sensitive skin", "carry-on luggage under 1.5kg".
- **L2 — positioning niche.** "Korean-style hyaluronic moisturiser for over-thirties", "polycarbonate hardshell with TSA lock under HK$1,500".

Each prompt repeated multiple times per engine to control run-to-run variance. Consumer-voice prompts kept distinct from developer/B2B prompts (DeepSeek's developer-corpus weight differs).

Travel/hospitality categories use **multi-turn panels** — single-shot panels under-report recommendation funnels.

## Engine endpoints

| Engine | API | Default model |
|---|---|---|
| DeepSeek | api.deepseek.com | deepseek-chat |
| Qwen | DashScope international (`dashscope-intl.aliyuncs.com/compatible-mode/v1`) | qwen-plus |
| Doubao | BytePlus ModelArk international (`ark.ap-southeast.bytepluses.com/api/v3`) | model IDs logged at session start + end |

**Two caveats published loudly:**
- Provider labels commonly confused: Qwen on DashScope (Alibaba). Doubao on BytePlus (ByteDance). Different engines on different infra.
- Neither endpoint exposes pinned-version handles. Test-retest reliability runs (below) control for snapshot drift.

## Reliability discipline

Test-retest stats reported in every readout:

- **κ_top-5** — top-5 source membership stability. All three engines: **1.00** (perfectly stable).
- **κ_top-15** — top-15 stability. **DeepSeek 0.89, Qwen 0.78, Doubao 0.46.** Doubao long-tail granular-tag normalisation issue documented explicitly.
- **Pearson r and ICC(2,1)** — source mention rates correlated **r 0.97–0.99 / ICC 0.97–0.99** on identical 30-prompt re-runs across all three engines.

Selective reporting (only κ_top-5, hiding κ_top-15) is a red flag. We disclose Doubao κ_top-15 = 0.46 even though it's the harder story.

**Sample-size discipline.** Always report n, panel coverage, category. SMZDM 72% in our handbag panel does NOT transfer to watches/luggage, and collapses to 33% at ultra-luxury price tier within handbags.

## Recommendation labelling

Every public recommendation labelled:

1. **Measured evidence** — observed in our panel. n + panel + engines stated. Limitations disclosed (single-LLM vs multi-LLM, descriptive vs causal).
2. **Prior-knowledge hypothesis** — consistent with published research (Aggarwal et al. KDD 2024, Zhang Kai & Yao Jingang arXiv 2604.25707v1, geo-citation-lab, Tw93). Cited with attribution; framed as hypothesis.
3. **Planned intervention test** — expected to help; before/after measurement required to confirm. We design the test, set the measurement date, report null/negative outcomes honestly.

We do not collapse the three. Marketing pressure tends to convert cited research into "we proved this" — Eastbound refuses the conversion.

## What we do NOT claim

- Do not inspect any engine's training corpus. Self-attribution ≠ training-data composition.
- Do not measure ChatGPT / Claude / Gemini / Perplexity with this methodology. Separate measurement against English-language panel required.
- Do not measure sales / conversion / attributable revenue. AI mention is brand-visibility signal.
- Do not claim JSON-LD makes Chinese AI find you. Bing/Copilot signal in our sample.
- Do not promise ranking positions. Fast layer = days-weeks. Compounding layer = quarters.

## Run the audit

- Free multi-engine audit: https://www.eastbound.ai/ai-visibility-audit/
- Pillar: https://www.eastbound.ai/china-ai-visibility/
- Research: https://www.eastbound.ai/blog/
- Free AI visibility audit: https://www.eastbound.ai/ai-visibility-audit/