Skip to content

Markdown for Agents

Markdown has become the standard format for AI agents and LLMs to consume web content. Its explicit structure minimizes token waste and improves comprehension compared to raw HTML. pageflare provides two complementary approaches to make your site agent-friendly: real-time markdown content negotiation at the edge and static llms.txt generation at build time.

When your site is served through the pageflare edge, any page can be requested as markdown by sending the Accept: text/markdown header. The edge worker converts the optimized HTML to clean markdown on the fly and caches the result.

  1. An agent sends a request with Accept: text/markdown.
  2. The edge worker checks for a cached markdown variant.
  3. If available, it serves the markdown directly. If not, it converts the HTML and stores the result.
  4. The response includes Content-Type: text/markdown and a Vary: Accept header so intermediate caches store HTML and markdown variants separately.

Using curl:

Terminal window
curl -H "Accept: text/markdown" https://example.com/about

Using JavaScript:

const response = await fetch("https://example.com/about", {
headers: { Accept: "text/markdown" },
});
const markdown = await response.text();
const tokens = response.headers.get("x-markdown-tokens");
HeaderDescription
Content-Typetext/markdown; charset=utf-8
VaryAccept — tells caches that the URL has multiple representations
x-markdown-tokensEstimated token count of the markdown content
X-pageflare-Statusoptimized when serving a processed page

The x-markdown-tokens header lets agents estimate cost before processing the content.

pageflare converts HTML to markdown during the optimization workflow. The converter extracts the page’s main content, strips navigation chrome and boilerplate, and produces clean markdown with preserved heading hierarchy, links, code blocks, and lists. The result is stored alongside the HTML in object storage and served on demand.

At build time, pageflare generates llms.txt files — structured indexes of your site’s content designed for LLM consumption. Unlike markdown content negotiation (which requires the edge), llms.txt files are static assets that work on any hosting platform.

FileDescription
llms.txtSite index with page titles, descriptions, and paths
llms-full.txtFull content of all pages concatenated as markdown
robots.txtGenerated or augmented with AI-specific directives

Sub-indexes can be generated for specific sections of your site (e.g., docs/llms.txt, api/llms.txt) by configuring llm.sub_indexes.

Add the llm section to your pageflare.jsonc:

{
"llm": {
"enabled": true,
"theme": "structured",
"base_url": "https://example.com",
"site_name": "My Site",
"site_description": "A description of your site for AI agents",
"sub_indexes": ["docs", "blog"],
"robots_policy": "allow-all",
"content_signals": {
"search": "yes",
"ai_input": "yes",
"ai_train": "no"
}
}
}

The theme option controls the output format of llms.txt:

ThemeDescription
structuredHierarchical with sections, descriptions, and metadata (default)
compactMinimal — titles and URLs only
detailedFull page summaries alongside links
defaultBasic format following the llms.txt specification

The robots_policy preset generates a robots.txt with appropriate directives for AI crawlers:

PresetEffect
allow-allNo restrictions on any crawler (default)
block-ai-trainingBlocks crawlers known to collect training data (GPTBot, CCBot, etc.)
block-all-aiBlocks all known AI crawlers
block-allBlocks all crawlers via Disallow: /

Content signals are per-page metadata hints that tell AI systems how they may use your content. They follow the emerging content signals specification.

SignalValuesMeaning
searchyes / noWhether content may appear in AI search results
ai_inputyes / noWhether content may be used as context in AI responses
ai_trainyes / noWhether content may be used for model training
CapabilityEdgeVercel / NetlifyStatic hosting
Markdown content negotiationAutomaticVia generated middlewareNot available
llms.txt generationAutomaticAt build timeAt build time
robots.txt generationAutomaticAt build timeAt build time
x-markdown-tokens headerYesYes (middleware)Not available

For Vercel and Netlify, pageflare generates platform-specific middleware that handles the Accept: text/markdown content negotiation. On static hosting without middleware support, use llms.txt to make your content discoverable by AI agents.