Markdown for Agents

Markdown has become the standard format for AI agents and LLMs to consume web content. Its explicit structure minimizes token waste and improves comprehension compared to raw HTML. pageflare provides two complementary approaches to make your site agent-friendly: real-time markdown content negotiation at the edge and static llms.txt generation at build time.

Markdown content negotiation PRO

When your site is served through the pageflare edge, any page can be requested as markdown by sending the Accept: text/markdown header. The edge worker converts the optimized HTML to clean markdown on the fly and caches the result.

How it works

An agent sends a request with Accept: text/markdown.
The edge worker checks for a cached markdown variant.
If available, it serves the markdown directly. If not, it converts the HTML and stores the result.
The response includes Content-Type: text/markdown and a Vary: Accept header so intermediate caches store HTML and markdown variants separately.

Requesting markdown

Using curl:

curl -H "Accept: text/markdown" https://example.com/about

Using JavaScript:

const response = await fetch("https://example.com/about", {
  headers: { Accept: "text/markdown" },
});

const markdown = await response.text();
const tokens = response.headers.get("x-markdown-tokens");

Response headers

Header	Description
`Content-Type`	`text/markdown; charset=utf-8`
`Vary`	`Accept` — tells caches that the URL has multiple representations
`x-markdown-tokens`	Estimated token count of the markdown content
`X-pageflare-Status`	`optimized` when serving a processed page

The x-markdown-tokens header lets agents estimate cost before processing the content.

How the conversion works

pageflare converts HTML to markdown during the optimization workflow. The converter extracts the page’s main content, strips navigation chrome and boilerplate, and produces clean markdown with preserved heading hierarchy, links, code blocks, and lists. The result is stored alongside the HTML in object storage and served on demand.

llms.txt generation

At build time, pageflare generates llms.txt files — structured indexes of your site’s content designed for LLM consumption. Unlike markdown content negotiation (which requires the edge), llms.txt files are static assets that work on any hosting platform.

What gets generated

File	Description
`llms.txt`	Site index with page titles, descriptions, and paths
`llms-full.txt`	Full content of all pages concatenated as markdown
`robots.txt`	Generated or augmented with AI-specific directives

Sub-indexes can be generated for specific sections of your site (e.g., docs/llms.txt, api/llms.txt) by configuring llm.sub_indexes.

Configuration

Add the llm section to your pageflare.jsonc:

{
  "llm": {
    "enabled": true,
    "theme": "structured",
    "base_url": "https://example.com",
    "site_name": "My Site",
    "site_description": "A description of your site for AI agents",
    "sub_indexes": ["docs", "blog"],
    "robots_policy": "allow-all",
    "content_signals": {
      "search": "yes",
      "ai_input": "yes",
      "ai_train": "no"
    }
  }
}

Themes

The theme option controls the output format of llms.txt:

Theme	Description
`structured`	Hierarchical with sections, descriptions, and metadata (default)
`compact`	Minimal — titles and URLs only
`detailed`	Full page summaries alongside links
`default`	Basic format following the llms.txt specification

Robots policy

The robots_policy preset generates a robots.txt with appropriate directives for AI crawlers:

Preset	Effect
`allow-all`	No restrictions on any crawler (default)
`block-ai-training`	Blocks crawlers known to collect training data (GPTBot, CCBot, etc.)
`block-all-ai`	Blocks all known AI crawlers
`block-all`	Blocks all crawlers via `Disallow: /`

Content signals

Content signals are per-page metadata hints that tell AI systems how they may use your content. They follow the emerging content signals specification.

Signal	Values	Meaning
`search`	`yes` / `no`	Whether content may appear in AI search results
`ai_input`	`yes` / `no`	Whether content may be used as context in AI responses
`ai_train`	`yes` / `no`	Whether content may be used for model training

Platform support

Capability	Edge	Vercel / Netlify	Static hosting
Markdown content negotiation	Automatic	Via generated middleware	Not available
`llms.txt` generation	Automatic	At build time	At build time
`robots.txt` generation	Automatic	At build time	At build time
`x-markdown-tokens` header	Yes	Yes (middleware)	Not available

For Vercel and Netlify, pageflare generates platform-specific middleware that handles the Accept: text/markdown content negotiation. On static hosting without middleware support, use llms.txt to make your content discoverable by AI agents.

CLI Configuration — LLM Files — full configuration reference
Pipeline — how the build pipeline generates these files