# llms.txt vs robots.txt — what each one does, and why you need both

**Canonical:** https://www.eastbound.ai/llms-txt-vs-robots-txt/
**Updated:** 2026-05-07

A surprising number of teams still ship one or the other and assume the second is redundant. They are not interchangeable. `robots.txt` is a negative gate that controls whether crawlers fetch your URLs. `llms.txt` is a positive index that tells AI engines which Markdown-rendered content represents your site. AI-ready sites ship both, and they ship them with different intent.

## At a glance

|  | robots.txt | llms.txt |
|---|---|---|
| Origin | Robots Exclusion Protocol, IETF RFC 9309 (2022). De facto since 1994. | llms.txt proposal by Jeremy Howard (Answer.AI), 2024. Community draft; not standardised. |
| Polarity | Negative. Compliant bots default to "allow"; the file *removes* URLs from that allow set. | Positive. The file *names* the URLs and Markdown alternates an engine should read. |
| Audience | Any crawler that respects RFC 9309. | AI engines that look for it — currently a partial set; expanding. Not a search-engine signal. |
| Path | `/robots.txt` at the host root. | `/llms.txt` at the host root. |
| Format | Directives keyed by `User-agent:`; `Allow:`, `Disallow:`, `Sitemap:`. | Structured Markdown: H1 site name, blockquote summary, sectioned link lists. |
| Enforcement | Honour-system. Compliant bots respect it; malicious bots ignore it. | Honour-system. No enforcement; engines fetch it if they look for it. |
| If absent | Default-allow. | Engines fall back to HTML rendering with all the parsing noise that implies. |

**The simple rule.** If you want a URL kept out of a crawler's reach, that is a `robots.txt` job. If you want an AI engine to find your *preferred* content quickly, that is an `llms.txt` job. Neither file does the other's work.

## What robots.txt actually does

The Robots Exclusion Protocol was formalised in [RFC 9309](https://datatracker.ietf.org/doc/html/rfc9309). The semantics are narrow: a compliant crawler reads the file before fetching anything else on the host, matches its `User-agent` against the listed groups, and treats any URL covered by `Disallow:` as off-limits. `Allow:` lifts a more specific path out of a broader `Disallow:`.

A minimal AI-aware `robots.txt`:

```
User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /private/

User-agent: anthropic-ai
Disallow: /private/

Sitemap: https://www.example.com/sitemap.xml
```

Three things that `robots.txt` **cannot** do:

- **It cannot tell engines where the good content is.** The file lists what is off-limits, not what is preferred.
- **It cannot pass a Markdown alternate.** The protocol has no concept of content format.
- **It cannot block a non-compliant bot.** The whole protocol is voluntary.

## What llms.txt actually does

`llms.txt` was proposed in 2024 by Jeremy Howard (Answer.AI) as an AI-specific analogue to `sitemap.xml` with two differences. First, it is Markdown rather than XML, which matches how LLMs prefer to ingest content. Second, it is editorial — it lists the content the site *wants* the AI to read, not every URL on the host.

The shape:

```
# Eastbound

> Eastbound is a Hong Kong-based China AI visibility consultancy.

## Links

- [Homepage](https://www.eastbound.ai/)
- [China AI visibility (pillar)](https://www.eastbound.ai/china-ai-visibility/)
- [AI visibility audit](https://www.eastbound.ai/ai-visibility-audit/)

## Markdown alternates

- [Homepage (Markdown)](https://www.eastbound.ai/index.md)
- [China AI visibility (Markdown)](https://www.eastbound.ai/china-ai-visibility/index.md)
- [AI visibility audit (Markdown)](https://www.eastbound.ai/ai-visibility-audit/index.md)
```

Engines that look for `/llms.txt` can prefer the listed Markdown alternates over rendered HTML — cleaner extraction, no navigation noise, no boilerplate.

What `llms.txt` **cannot** do:

- **It is not a block list.** Listing fewer URLs in `llms.txt` does not stop engines from crawling the rest.
- **It is not a search-engine signal.** Google, Bing and Baidu do not currently use it for indexing.
- **It is not universally honoured.** Engine support is partial. Treat it as upside, not as a guarantee.

## Where teams confuse the two

A team blocks `GPTBot` in `robots.txt` for training-opt-out reasons, then ships `llms.txt` assuming it overrides the block for citation purposes. It does not. `llms.txt` has no enforcement layer; it is a hint. `robots.txt` is the gate.

OpenAI keeps separate user-agents for training (`GPTBot`) and search (`OAI-SearchBot`). The only way to be opted out of training while still appearing in ChatGPT search results is to allow `OAI-SearchBot` in `robots.txt` and disallow `GPTBot`.

**Precedence:** when both files apply, `robots.txt` always wins. A URL allowed in `llms.txt` but disallowed in `robots.txt` will not be fetched. The reverse is fine — `llms.txt` just doesn't recommend it as a preferred entry point.

## The minimum AI-ready configuration

1. **`/robots.txt`** — allow at `User-agent: *`; allow named search-and-citation bots (Googlebot, Bingbot, OAI-SearchBot, PerplexityBot, ClaudeBot, Google-Extended); disallow named training bots (GPTBot, anthropic-ai, CCBot) if your editorial position is opt-out-of-training; include a `Sitemap:` directive.
2. **`/sitemap.xml`** — exhaustive URL list with `<lastmod>` and `<priority>`. Submit to Google Search Console and Bing Webmaster Tools.
3. **`/llms.txt`** — editorial index of the pages you want AI engines to prefer, with a parallel "Markdown alternates" section.
4. **Per-page Markdown alternates** — every page in `llms.txt` ships a corresponding `.md` file at the canonical path and references it in HTML head: `<link rel="alternate" type="text/markdown" href="/foo/index.md" />`.
5. **`/llms-full.txt`** (optional) — a single-file long-form Markdown bundle of the site's most valuable content.

Each is a baseline-level intervention — necessary infrastructure, not a result. Shipping these signals does not in itself produce citation lift. See [/ai-crawler-readiness/](https://www.eastbound.ai/ai-crawler-readiness/) for the full configuration guide.

## Format gotchas

- **Case sensitivity.** `robots.txt` filenames are case-sensitive on most webservers. The file must be exactly `robots.txt` and `llms.txt`.
- **Path placement.** Both files belong at the host root, not in subdirectories. `/blog/llms.txt` is invisible to engines.
- **Per-subdomain.** `blog.example.com` needs its own `/llms.txt` and `/robots.txt` independent of `www.example.com`.
- **Content-Type.** Serve both with `text/plain` (or `text/markdown` for `llms.txt`). Avoid `application/octet-stream` — some bots refuse it.
- **Canonicalisation.** If apex 301s to www, place the files on the canonical host. The safe pattern is to serve the file from both apex and www.

## Related reading

- [AI crawler readiness — the infrastructure layer](https://www.eastbound.ai/ai-crawler-readiness/)
- [AI crawler blocking mistakes](https://www.eastbound.ai/ai-crawler-blocking-mistakes/)
- [Markdown alternates guide](https://www.eastbound.ai/markdown-alternates-guide/)
- [IndexNow setup guide](https://www.eastbound.ai/indexnow-setup-guide/)
- [China AI visibility pillar](https://www.eastbound.ai/china-ai-visibility/)