China AI visibility · How-to playbook
How to improve brand visibility in AI search engines
Improving brand visibility in AI search engines breaks into three layers. Layer 1 (technical infrastructure) is a 1-hour job. Layer 2 (content design) is a multi-week job. Layer 3 (off-site source-graph) is a multi-quarter job that compounds. Each layer is measured by its own published evidence, and three of the most common tactics — FAQPage schema, "increase content length", and JSON-LD as a universal AI signal — do not appear in the validated set. Here is the working playbook.
Last reviewed 2026-05-10. Tactics drawn from Aggarwal et al. KDD 2024, Zhang & Yao Jingang 2026, SE Ranking 2025, Williams-Cook 2026, and our own 540-call panel.
The framework — three things to optimise, in order
Generative AI engines do not work like Google's blue-link results. The clearest published model — Zhang Kai & Yao Jingang 2026 (arXiv:2604.25707v1) — separates the process into citation selection (does the engine retrieve your page into its source pool?) and citation absorption (does the page's language, structure or facts actually shape the answer the user reads?). User-visible mention is a third stage downstream of both.
Tw93's 2026 instrumentation of ChatGPT made the gap concrete: the engine retrieves roughly 100 pages per query, but only ~15% surface in the answer. The other 85% are selected but not absorbed. So three different metrics, with three different optimisation tactics for each. Get them in the right order.
- Layer 1 — Technical infrastructure. Without this, your pages are not selected. Hours of work.
- Layer 2 — Content design. Without this, your selected pages are not absorbed. Weeks of work.
- Layer 3 — Off-site source-graph. Without this, your absorbed pages are referenced rarely. Quarters of work, compounding.
Layer 1 — Technical infrastructure (1 hour)
The selection floor. If your pages are not crawlable, indexable, and parseable by the AI bots, none of the rest matters. Vercel's 2025 crawler study confirmed GPTBot, ClaudeBot and PerplexityBot fetch raw HTML and do not execute JavaScript. The implications are concrete:
1. Granular robots.txt
Most sites have one User-agent: * block and call it done. Modern AI engines run multiple user agents — separate ones for training crawl, retrieval crawl, and user-triggered fetches. AI crawler readiness is the configuration reference: explicit Allow rules for OAI-SearchBot, Claude-SearchBot, PerplexityBot, ChatGPT-User, etc. Block training-only bots if you want; do not accidentally block retrieval bots that drive citation.
2. llms.txt and llms-full.txt
A positive index pointing AI engines at the content you want them to read. Different problem from robots.txt (negative gate). See llms.txt vs robots.txt for the disambiguation, and Markdown alternates guide for serving per-page Markdown alongside HTML.
3. JavaScript-only rendering kills you
GPTBot / ClaudeBot / PerplexityBot fetch raw HTML, no JS execution. SPA frameworks that hydrate client-side leave the bots with empty pages. Server-side render anything you want cited. Diagnostic: fetch your URL with curl in an incognito session — if you don't see body copy, neither does the engine.
4. IndexNow, sitemap.xml, canonical tags
IndexNow (Bing/Copilot) accelerates indexing on every content change. Sitemap covers Google AI Overview's underlying Google index. Canonical tags prevent duplicate-content splits. IndexNow setup guide covers the deployment.
5. The seven blocking mistakes
Most teams trip on overbroad robots.txt rules, Cloudflare bot-fight mode, JavaScript-only rendering, geo-blocking, login walls, slow render times, or CSP misconfigurations that block crawlers. AI crawler blocking mistakes walks through each.
Layer 2 — Content design (multi-week)
The absorption layer. Your pages are now reachable; the question is whether their language and structure get extracted into answers. Aggarwal et al., KDD 2024 ran a 10,000-query benchmark across nine optimisation tactics and reported the headline numbers most of the field now cites. Three tactics produced statistically reliable lifts:
| Tactic | Citation lift |
|---|---|
| Adding authoritative citations to your page | +115% |
| Adding direct quotes from credible sources | +43% |
| Adding relevant statistics with named sources | +33% |
Notably absent from the validated set: FAQ format, FAQPage schema, generic "increase content length" advice. The absence is itself a finding — these are the most common heuristics in the field that the academic measurement work has not validated.
1. Length — the 1,000–3,000-word sweet spot
Cross-study consensus places the sweet spot at 1,000–3,000 words per reference page with 10+ headings. Below 500 words, pages function as snippets that rarely match a substantive prompt. Above 3,000 words, marginal value falls and the editorial cost of keeping the page accurate compounds. Low-cited pages average 170 words in published samples; high-cited pages average ~2,000 — a more-than-10× gap.
2. Specificity beats fluency
The strongest single predictor across studies is semantic similarity between page content and user query. Pages with real numbers, dated comparisons, named entities and clear definitions are cited 50%+ more than vague pages making the same claim. Step-structured content (numbered procedures, decision trees) outperforms prose summaries.
3. Encyclopedia-style explainer pages outperform news
Wikipedia-style "what is X / how does X work" pages have roughly 3× the influence per citation of news pages in published samples. The mechanism: an explainer page is reusable across many prompts; a news page is locked to a single window of relevance.
4. What does NOT work
- FAQPage schema. SE Ranking's 129K-domain × 216K-page analysis (covered in Search Engine Journal, 2025) found FAQ-schema pages averaged 3.6 ChatGPT citations versus 4.2 without. Williams-Cook's 2026 controlled test confirmed FAQPage JSON-LD confers no extraction advantage over visible Q&A copy. Skip it.
- JSON-LD as a universal AI signal. Williams-Cook's fake-schema test showed ChatGPT and Perplexity tokenise JSON-LD as plain text without structural parsing. Bing/Copilot is the confirmed exception — Microsoft's Fabrice Canel publicly confirmed Bing uses schema for Copilot grounding (SMX Munich, March 2025). Keep schema for that bonus; do not invest expanding it as primary tactic.
- Padding to "increase content length." Length only helps when each chunk adds independent information. Padding lowers signal-to-noise and reduces citation rates.
- User-Agent sniffing to serve different content to bots. Cloaking. Penalised.
Layer 3 — Off-site source-graph (multi-quarter)
The compounding moat. The single highest-leverage finding in the published research:
This is the structural reason brand visibility in AI search is a brand-presence play, not a content-marketing play. The infrastructure work (Layer 1) is a one-hour layer. The content work (Layer 2) is a multi-week layer. The third-party source-graph work — Wikipedia, Reddit, Hacker News, vertical media for Western engines, and the Mainland Chinese stack (百度百科, 知乎, 微信公众号, 36氪, 虎嗅, 小红书, SMZDM) for Chinese engines — is a multi-quarter layer that compounds.
Western source-graph priorities
- Wikipedia — surfaced in 21% of DeepSeek responses across our luxury-handbag prompt panel; the highest-mentioned Western source in the sample. See Wikipedia AI visibility.
- Reddit, Hacker News, GitHub — community trust signals. ChatGPT and Perplexity weight Reddit unusually heavily.
- YouTube — surfaced in 20% of DeepSeek responses despite the geo-block. See YouTube AI visibility.
- Vertical industry publications — Search Engine Land, Search Engine Journal, Ahrefs blog, Semrush blog cite each other constantly; getting placed there compounds.
Chinese source-graph priorities (different ecosystem entirely)
- 百度百科 — Mainland encyclopedia anchor. Different rules from Wikipedia.
- 知乎 — long-form Q&A. See Zhihu AI visibility.
- 小红书 (Xiaohongshu) — lifestyle and B2C. See Xiaohongshu AI visibility.
- SMZDM (什么值得买) — commerce aggregator with high citation rate at aspirational tiers, low at ultra-luxury. See SMZDM AI visibility.
- WeChat 公众号 + 36氪 / 虎嗅 — Mainland tier-2 vertical media.
For the long version see Traditional SEO won't get you into Chinese AI answers — the 3,562-word study on which third-party signals matter for Chinese engines specifically.
How to measure progress
Three metrics, each measured separately:
- Selection rate. What percentage of relevant prompts pull your domain into the engine's source pool? (Top-15 cited sources.)
- Absorption rate. What percentage of selections produce extractable content in the answer? (Engine quotes you, paraphrases your phrasing, or pulls a number from your page.)
- Mention rate. What percentage of relevant prompts result in a user-visible mention of your brand name?
Eastbound's measurement methodology details how we measure all three — stratified zh-CN consumer prompt panels, top-5 + top-15 reliability stats (κ), measured / hypothesis / planned-test labels for every claim. The free AI visibility audit runs your domain through this against DeepSeek, Qwen and Doubao.
Realistic timeline
| Layer | Time investment | Compounding? | Single biggest blocker |
|---|---|---|---|
| 1. Technical infrastructure | 1 hour to 1 day | No — set-and-forget | JavaScript-only rendering |
| 2. Content design | 2–8 weeks per reference page | Per-page yes; cross-page yes for evergreen pages | Editorial discipline |
| 3. Off-site source-graph | 1–3 quarters minimum | Yes — strongly compounding | Cannot be bought; relationship-driven |
Most brands rush Layer 2, skip Layer 3, and never circle back to Layer 1. The opposite order is correct: Layer 1 first, Layer 2 second, Layer 3 last but plan it from day 1.
China is a separate execution problem
The framework above (selection / absorption / mention) is engine-agnostic. The platforms and source-graph in Layer 3 are not. In our 540-call panel (May 2026), top-15 cited-source overlap (Jaccard) between any two Chinese engines was 0.20–0.30 — and overlap between Western and Chinese engines is lower still. A source-graph plan built for ChatGPT cannot be ported to DeepSeek without rebuilding from scratch — different language, different platforms, different community norms. See China AI visibility for global brands for the dedicated Chinese-engine treatment.
Run the audit, then work the layers
The free Eastbound audit reports your selection / absorption / mention scores against DeepSeek, Qwen and Doubao on a stratified zh-CN consumer prompt panel. From there the next move is concrete.