Markdown for Agents
Markdown has become the standard format for AI agents and LLMs to consume web content. Its explicit structure minimizes token waste and improves comprehension compared to raw HTML. pageflare provides two complementary approaches to make your site agent-friendly: real-time markdown content negotiation at the edge and static llms.txt generation at build time.
Markdown content negotiation PRO
Section titled “Markdown content negotiation ”When your site is served through the pageflare edge, any page can be requested as markdown by sending the Accept: text/markdown header. The edge worker converts the optimized HTML to clean markdown on the fly and caches the result.
How it works
Section titled “How it works”- An agent sends a request with
Accept: text/markdown. - The edge worker checks for a cached markdown variant.
- If available, it serves the markdown directly. If not, it converts the HTML and stores the result.
- The response includes
Content-Type: text/markdownand aVary: Acceptheader so intermediate caches store HTML and markdown variants separately.
Requesting markdown
Section titled “Requesting markdown”Using curl:
curl -H "Accept: text/markdown" https://example.com/aboutUsing JavaScript:
const response = await fetch("https://example.com/about", { headers: { Accept: "text/markdown" },});
const markdown = await response.text();const tokens = response.headers.get("x-markdown-tokens");Response headers
Section titled “Response headers”| Header | Description |
|---|---|
Content-Type | text/markdown; charset=utf-8 |
Vary | Accept — tells caches that the URL has multiple representations |
x-markdown-tokens | Estimated token count of the markdown content |
X-pageflare-Status | optimized when serving a processed page |
The x-markdown-tokens header lets agents estimate cost before processing the content.
How the conversion works
Section titled “How the conversion works”pageflare converts HTML to markdown during the optimization workflow. The converter extracts the page’s main content, strips navigation chrome and boilerplate, and produces clean markdown with preserved heading hierarchy, links, code blocks, and lists. The result is stored alongside the HTML in object storage and served on demand.
llms.txt generation
Section titled “llms.txt generation”At build time, pageflare generates llms.txt files — structured indexes of your site’s content designed for LLM consumption. Unlike markdown content negotiation (which requires the edge), llms.txt files are static assets that work on any hosting platform.
What gets generated
Section titled “What gets generated”| File | Description |
|---|---|
llms.txt | Site index with page titles, descriptions, and paths |
llms-full.txt | Full content of all pages concatenated as markdown |
robots.txt | Generated or augmented with AI-specific directives |
Sub-indexes can be generated for specific sections of your site (e.g., docs/llms.txt, api/llms.txt) by configuring llm.sub_indexes.
Configuration
Section titled “Configuration”Add the llm section to your pageflare.jsonc:
{ "llm": { "enabled": true, "theme": "structured", "base_url": "https://example.com", "site_name": "My Site", "site_description": "A description of your site for AI agents", "sub_indexes": ["docs", "blog"], "robots_policy": "allow-all", "content_signals": { "search": "yes", "ai_input": "yes", "ai_train": "no" } }}Themes
Section titled “Themes”The theme option controls the output format of llms.txt:
| Theme | Description |
|---|---|
structured | Hierarchical with sections, descriptions, and metadata (default) |
compact | Minimal — titles and URLs only |
detailed | Full page summaries alongside links |
default | Basic format following the llms.txt specification |
Robots policy
Section titled “Robots policy”The robots_policy preset generates a robots.txt with appropriate directives for AI crawlers:
| Preset | Effect |
|---|---|
allow-all | No restrictions on any crawler (default) |
block-ai-training | Blocks crawlers known to collect training data (GPTBot, CCBot, etc.) |
block-all-ai | Blocks all known AI crawlers |
block-all | Blocks all crawlers via Disallow: / |
Content signals
Section titled “Content signals”Content signals are per-page metadata hints that tell AI systems how they may use your content. They follow the emerging content signals specification.
| Signal | Values | Meaning |
|---|---|---|
search | yes / no | Whether content may appear in AI search results |
ai_input | yes / no | Whether content may be used as context in AI responses |
ai_train | yes / no | Whether content may be used for model training |
Platform support
Section titled “Platform support”| Capability | Edge | Vercel / Netlify | Static hosting |
|---|---|---|---|
| Markdown content negotiation | Automatic | Via generated middleware | Not available |
llms.txt generation | Automatic | At build time | At build time |
robots.txt generation | Automatic | At build time | At build time |
x-markdown-tokens header | Yes | Yes (middleware) | Not available |
For Vercel and Netlify, pageflare generates platform-specific middleware that handles the Accept: text/markdown content negotiation. On static hosting without middleware support, use llms.txt to make your content discoverable by AI agents.
Related
Section titled “Related”- CLI Configuration — LLM Files — full configuration reference
- Pipeline — how the build pipeline generates these files