AI crawler readiness · How-to

Markdown alternates — a working how-to.

Per-page Markdown alternates let AI engines fetch your content as clean Markdown instead of HTML. The signal is small in absolute terms but compounds: cleaner extraction, fewer parsing errors, less navigation noise carried into citations. This page is the configuration guide — what to ship, where to put it, and how each major site stack handles it.

Companion to AI crawler readiness and llms.txt vs robots.txt. Updated 2026-05-07.

Why a Markdown alternate, not just clean HTML

Modern HTML pages carry a lot of structural overhead: header navigation, sidebar links, related-article carousels, marketing CTAs, footer columns, GDPR banners, structured-data scripts, analytics tags. When an AI engine fetches the page to extract content for citation, it has to identify and discard all of that before reaching the body text. Some engines do this well; some do not. Even the ones that do well sometimes carry boilerplate strings into their answer paraphrases.

A Markdown alternate is the same content rendered to clean Markdown — body copy, headings, lists, code blocks, internal links — without the chrome. The engine can fetch it, parse it with near-zero noise, and ground its answer in the actual content. Engines that look for Markdown alternates explicitly include several AI assistants and indexers; engines that do not look for them still benefit from the linked declaration in HTML head when their content extractors hit a cleaner extraction path.

What this is not. Markdown alternates are not a ranking signal in the search-engine sense. Google does not boost pages with Markdown alternates. Bing does not. The benefit is downstream: when an engine cites your page, the citation is grounded in cleaner text. Treat this as extraction quality, not retrieval boost.

How to declare the alternate (the link rel pattern)

In the HTML page's <head>, add a single line:

<link rel="alternate" type="text/markdown" href="/foo/index.md" />

Three things matter about this line:

The type attribute is text/markdown, not text/plain. The IANA media type registration for Markdown is text/markdown; engines look for that exact value.
The href is path-based, not extension-based on the page URL. A page at /foo/ declares /foo/index.md. A page at /foo.html declares /foo.md. Match the structure of the directory, not the rendered URL.
One per page. Don't declare multiple Markdown alternates with different language or content variants — pick one canonical Markdown rendering per page.

Mirror this declaration in llms.txt in the "Markdown alternates" section, which gives engines a one-stop list of where every alternate lives. See the llms.txt format reference.

What to put in the .md file

The Markdown file is not a literal Markdown-to-HTML transform of the rendered page. It is a clean, content-only rendering. Strip:

Site header, navigation, breadcrumbs (these belong in llms.txt, not in every page)
CTAs, banners, modal triggers, GDPR consent strings
Sidebar promotional content, related-article carousels
Footer link columns, social links, copyright lines
Inline tracking, analytics scripts, ad slots

Keep:

The page title as H1
A short canonical/updated metadata block at the top (1–3 lines)
Body headings and paragraphs as Markdown
Lists, tables, code blocks, blockquotes
Internal links (full URLs, not relative — engines parsing standalone .md files don't always have base URL context)
External citations as proper Markdown links

The Markdown file should be content-equivalent to the HTML page, not a stripped-down summary. If the HTML body has 1,800 words, the Markdown file should have ~1,800 words. Truncating defeats the purpose: engines that fetch the alternate will use what is there. Short alternates produce short citations.

Stack patterns

Three common site architectures and how each ships Markdown alternates:

Static site (Hugo, Jekyll, 11ty, Astro, Next.js SSG)

The natural fit. Most static-site generators already source content from Markdown files. The simplest pattern is to render {slug}.md as {slug}/index.html and copy the same source .md through the build to {slug}/index.md in the output directory. Hugo's output formats support this directly: declare a markdown output format alongside html in config.toml, point both at the same template tree, and Hugo emits both files per page. Astro and 11ty support similar through plugins or build hooks.

CMS-backed (WordPress, Contentful, Sanity, Strapi)

Trickier because the source-of-truth is structured data, not Markdown. Two patterns work. First, render Markdown server-side: add a /foo/index.md route that takes the same content fetch as /foo/, renders it through a Markdown serializer (e.g. Turndown for HTML→MD, or a custom block-renderer), and returns it with Content-Type: text/markdown. Second, build-time export: a nightly job iterates all CMS entries and writes static .md files to the same path served by your CDN. Build-time is simpler; render-time stays fresh.

Edge-function dynamic (Cloudflare Workers, Vercel Edge, AWS Lambda@Edge)

For pages that aren't statically generated, an edge function intercepts *.md requests, fetches the upstream HTML, runs it through a server-side HTML-to-Markdown converter (turndown, html-to-md, node-html-markdown) with rules tuned to strip your specific chrome, and returns the result with Content-Type: text/markdown. Cache aggressively — once per content edit, not per request. The downside: HTML-to-Markdown converters are imperfect, and tuning the strip rules is iterative work.

Authoring once, rendering twice. The cleanest pattern across stacks: keep the source-of-truth content in Markdown, render Markdown directly to .md output, render Markdown through your template engine to .html output. The two outputs share a source; they cannot drift. HTML-to-Markdown conversion is a fallback for stacks where Markdown source isn't feasible.

Content-Type and HTTP details

The HTTP response for a Markdown alternate must declare Content-Type: text/markdown (or text/markdown; charset=utf-8). Several engines refuse to parse a .md file served as application/octet-stream or text/plain; they expect the registered media type.

On S3 the content type is set per object at upload time. Use aws s3 cp ... --content-type "text/markdown; charset=utf-8" explicitly for every .md file. On Cloudflare R2, set the upload header. On Netlify, configure _headers:

/*.md
  Content-Type: text/markdown; charset=utf-8

On Vercel, configure vercel.json:

{
  "headers": [
    {
      "source": "/(.*)\\.md",
      "headers": [
        { "key": "Content-Type", "value": "text/markdown; charset=utf-8" }
      ]
    }
  ]
}

On nginx or Apache, add a MIME-type mapping in the server config. Verify after deploy with curl -I https://www.example.com/foo/index.md — the response header should read Content-Type: text/markdown; charset=utf-8.

Per-page alternates vs a single llms-full.txt bundle

Per-page alternates and a single-file llms-full.txt bundle solve overlapping but distinct problems. Per-page alternates let an engine fetch the specific page it's citing, which keeps payload small and matches the URL it would otherwise cite. A bundled llms-full.txt lets an engine ingest the whole site in one fetch, which suits engines that prefer to context-load before answering rather than fetch-on-demand.

The recommendation: ship per-page alternates as the baseline. Add llms-full.txt as a supplement when the site has fewer than ~50 pages and the total Markdown corpus fits in a few hundred kilobytes. Don't ship llms-full.txt for sites with thousands of pages — engines that fetch a several-megabyte bundle on every visit will rate-limit themselves and cache aggressively.

Per-page Markdown alternates: shipping checklist

Decide your source-of-truth pattern: Markdown source rendered twice, or HTML-to-Markdown conversion at build / edge.
Generate the .md file at the canonical path of every shipping page.
Verify the .md file is content-equivalent to the rendered HTML, with chrome stripped.
Set Content-Type: text/markdown; charset=utf-8 on the deploy.
Add <link rel="alternate" type="text/markdown" href="..." /> to the HTML head of the corresponding page.
List the alternate in /llms.txt under "Markdown alternates".
Smoke-test post-deploy: curl -sI for the content type, curl -s for the body.
Re-test after every content change. Markdown-HTML drift is the main failure mode in long-running sites.

Use this with the rest of the readiness layer

Markdown alternates pair with llms.txt as the discoverability mechanism. Both pair with robots.txt as the gating layer and with sitemap.xml + IndexNow as the change-notification layer. Read the parent reference to see how the four signals stack, and check the common blocking mistakes page for the failure modes that bite sites in practice.

Run an AI crawler audit →