DeepSeek vs Qwen vs Doubao — Source-Mix Comparison

The headline finding

Across our 540-call source-influence panel run in May 2026, DeepSeek, Qwen and Doubao cited Mainland-CN sources at materially different rates, drew on different secondary surfaces, and showed different long-tail stability. Cross-engine source overlap (Jaccard top-15) was 0.20–0.30 — they are not interchangeable, and a measurement framework built for one cannot be ported to another without rebuilding the source-substrate model.

For a global brand asking "which Chinese AI engine should we optimise for?", the honest answer is: all three, separately, because the work is different on each. The rest of this page lays out exactly how it differs.

Mainland-CN source share by engine

The first axis of difference is the share of citations each engine draws from Mainland-Chinese platforms versus Western platforms (Wikipedia, YouTube, Reddit, vertical media):

DeepSeek 72.3%

Qwen 85.0%

Doubao 88.6%

Share of citations attributed to Mainland-CN platforms across 540 calls. Descriptive measurement — not causal, and not a claim about training corpora.

DeepSeek is the most Western-balanced of the three. Its Western surface is community-led: Wikipedia 21%, YouTube 20%, Reddit secondary but consistent. For a US or UK brand without yet-mature Mainland source-graph presence, DeepSeek is often the first engine to surface them — because community-led Western evidence already exists for most established brands.

Qwen sits in the middle at 85% Mainland-CN. Its Western secondary surface is institutional rather than community: regulatory documents, professional associations, academic publications. Where DeepSeek pulls Reddit, Qwen pulls IEEE / arXiv / ministry whitepapers. This makes Qwen the most consequential engine for brands in regulated categories — pharma, medical device, financial services, education, food safety.

Doubao is the most CN-substrate-biased at 88.6%. Its within-CN mix tilts toward commerce and lifestyle aggregators — SMZDM, Xiaohongshu, Bilibili surface at higher rates here than on either of the other two engines. For aspirational consumer goods, FMCG, beauty and travel categories, Doubao is often the engine where category-level brand surfacing is most decisive.

Cross-engine source overlap (top-15 Jaccard)

Source share is the within-engine view. The cross-engine view asks: of the top-15 sources each engine cites, how many are shared? The answer is "not many":

	DeepSeek	Qwen	Doubao
DeepSeek	—	0.30	0.20
Qwen	0.30	—	0.25
Doubao	0.20	0.25	—

Top-15 source overlap (Jaccard). Lower = more divergence. Across our panel, the three engines share less than 1/3 of their top-15 sources pairwise.

A 0.20–0.30 Jaccard means roughly two-thirds of the top sources differ between any pair of engines. This is the structural reason a generic "AI visibility" audit built for ChatGPT or Gemini cannot be ported to the Chinese engines without rebuilding the source-substrate model. It is also the reason a DeepSeek-only optimisation strategy under-reports what your brand looks like on Qwen and Doubao.

Reliability — top-5 vs top-15

The previous two charts describe a single panel. Reliability asks: do the same prompts produce the same source rankings on a re-run? We re-ran the identical 30-prompt panel one week later and report multiple statistics. The headline:

Top-15 source membership stability (κ)

DeepSeek 0.89

Qwen 0.78

Doubao 0.46

Cohen's κ for top-15 source-membership agreement across two consecutive re-runs of the identical 30-prompt panel. Top-5 stability was κ=1.00 for all three engines.

Three things to read out of this chart:

Top-5 source membership is perfectly stable across all three engines (κ_top-5 = 1.00). The most-cited sources are reliable findings.
Top-15 stability differs materially. DeepSeek (0.89) and Qwen (0.78) hold up well; Doubao (0.46) does not. This is a granular-tag normalisation issue we document explicitly in our reliability report.
Pearson r and ICC tell the same story at the rate level. Source mention rates correlated at r 0.97–0.99 across all three LLMs (ICC(2,1) 0.97–0.99) on identical re-runs.

Practical implication: when reading any Doubao-specific source-graph recommendation we make, treat the top-5 sources as actionable and the long tail with appropriate caveat. We disclose this in every Doubao-related readout because a reliability table that reports only κ_top-5 (where everyone scores 1.00) and hides κ_top-15 is reporting selectively.

What this means for your strategy

The comparative view above leads to a small number of practical conclusions for US and UK brands considering Mainland-CN AI visibility work:

Measure all three, separately

Optimising for one engine on the assumption that the others will follow is a known failure mode. The 0.20–0.30 cross-engine Jaccard means a DeepSeek-tuned strategy will under-report what Doubao surfaces by roughly 70–80% on the source side. The Eastbound free multi-engine audit reports each engine separately for this reason.

Pick the engine your category actually surfaces on

Some categories surface decisively on one engine. Regulated categories (pharma, medical, financial services) skew Qwen because of its institutional source mix. Aspirational consumer / FMCG / beauty / travel skew Doubao because of its commerce-and-lifestyle lean. Developer-leaning B2B SaaS skews DeepSeek because of its developer-corpus weight on technical questions. We separate consumer-facing and developer-facing prompt pools in our panels because the source-mix patterns differ materially.

Read engine-specific playbooks in order

Each of the three engines has its own optimisation logic. The full playbooks:

DeepSeek SEO visibility playbook — Western-balanced source mix; encyclopedia presence is the strongest predictor of mention rate in our 5-niche probe.
Qwen optimization playbook — institutional / professional source bias; runs on DashScope international (NOT BytePlus).
Doubao optimization playbook — CN-substrate-biased + commerce/lifestyle aggregator lean; runs on BytePlus ModelArk international (NOT DashScope).

For DeepSeek-only rank tracking specifically, see the DeepSeek SEO rank tracker — narrower in scope than the multi-engine audit, faster to run.

What we do not claim from this comparison. Findings are descriptive, not causal — we measure what each engine self-attributes when answering. We have not inspected training corpora and cannot. Findings on the Chinese engines do not transfer to ChatGPT, Claude, Gemini or Perplexity, which we measure separately when paid engagements require it. Numbers cited here are stable across the May 2026 re-run we did, but as model versions change over time, periodic re-measurement is part of our paid monitoring engagements.

Methodology note

The numbers on this page come from our 540-call source-influence panel (30 prompts × 3 LLMs × 3 reps × 2 turns) plus the matched re-run for reliability. Engines were queried via their live API endpoints — DeepSeek (deepseek-chat), Qwen on DashScope international (qwen-plus), Doubao on BytePlus ModelArk international. Model IDs were logged at session start and end; we cannot guarantee identical model snapshots across runs because neither endpoint exposes pinned-version handles.

Source attributions are normalised to canonical platform IDs (e.g., "Xiaohongshu" rather than a specific post). The Jaccard overlap is computed on top-15 source sets per engine; κ_top-5 and κ_top-15 are Cohen's κ for source membership agreement across the two consecutive re-runs.

For the full methodology, see how Eastbound measures China AI visibility.

DeepSeek vs Qwen vs Doubao: three engines, three source ecosystems.