Skip to content

Commit 3eddc36

Browse files
mvvmmask-bonk[bot]
andauthored
Use Markdown for Agents for all docs markdown generation (#28921)
* first pass refactor * handle redirect in code, not cloudflare rules * use style guide as example for markdown pages * removed application handling on index.md * update llms.txt to not use index.md by default * copy edits * copy edit * llms.txt improvements * dont use MFA abbreviation * fix llms.txt spacing issue * add back removed test * Use correct url for llms.txt and markdown files * remove duplicate llms.txt entries * use index.md links in llms.txt * llms.txt block quote updates * use a better example url in ai tooling page * add index.md to starlightLinksValidator exclude * Update astro.config.ts Co-authored-by: ask-bonk[bot] <249159057+ask-bonk[bot]@users.noreply.github.com> * Update src/pages/llms.txt.ts Co-authored-by: ask-bonk[bot] <249159057+ask-bonk[bot]@users.noreply.github.com> * Fix llms.txt section labels sometimes being incorrect --------- Co-authored-by: ask-bonk[bot] <249159057+ask-bonk[bot]@users.noreply.github.com>
1 parent c163b4f commit 3eddc36

13 files changed

Lines changed: 243 additions & 348 deletions

File tree

astro.config.ts

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,7 @@ export default defineConfig({
145145
"/llms.txt",
146146
"/llms-full.txt",
147147
"**/llms.txt",
148+
"**/index.md",
148149
"{props.*}",
149150
"/",
150151
"/glossary/",
@@ -156,10 +157,8 @@ export default defineConfig({
156157
"/workers/examples/?tags=*",
157158
"/workers/llms-full.txt",
158159
"/workers-ai/models/**",
159-
"**index.md",
160160
"/markdown.zip",
161161
"/style-guide/index.md",
162-
"/style-guide/fixtures/markdown/index.md",
163162
"/videos/**",
164163
"/search/**",
165164
],

src/content/docs/style-guide/fixtures/index.mdx

Lines changed: 0 additions & 8 deletions
This file was deleted.

src/content/docs/style-guide/fixtures/markdown.mdx

Lines changed: 0 additions & 24 deletions
This file was deleted.

src/content/docs/style-guide/how-we-docs/ai-consumability.mdx

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,13 @@ The references to the panels `id`, usually handled by JavaScript, are visible bu
4141

4242
### Turning our components into "Markdownable" HTML
4343

44-
To solve this, we created a [`rehype plugin`](https://github.com/cloudflare/cloudflare-docs/blob/d5a19deded110bce6a7c5d45e702d36527da0a4e/src/plugins/rehype/filter-elements.ts) for:
44+
To solve this, we use [Markdown for Agents](/fundamentals/reference/markdown-for-agents/), which converts HTML to Markdown at the Cloudflare network layer. It handles:
4545

46-
- Removing non-content tags (`script`, `style`, `link`, etc) via a [tags allowlist](https://github.com/cloudflare/cloudflare-docs/blob/d5a19deded110bce6a7c5d45e702d36527da0a4e/src/plugins/rehype/filter-elements.ts#L19-L104)
47-
- [Transforming custom elements](https://github.com/cloudflare/cloudflare-docs/blob/d5a19deded110bce6a7c5d45e702d36527da0a4e/src/plugins/rehype/filter-elements.ts#L189-L227) like `starlight-tabs` into standard unordered lists
48-
- [Adapting our Expressive Code codeblocks HTML](https://github.com/cloudflare/cloudflare-docs/blob/d5a19deded110bce6a7c5d45e702d36527da0a4e/src/plugins/rehype/filter-elements.ts#L143-L178) to the [HTML that CommonMark expects](https://spec.commonmark.org/0.31.2/#example-142)
46+
- Removing non-content tags (`script`, `style`, `link`, etc.)
47+
- Transforming custom elements like `starlight-tabs` into standard unordered lists
48+
- Adapting code block HTML into clean Markdown fenced code blocks
4949

50-
Taking the `Tabs` example from the previous section and running it through our plugin will now give us a normal unordered list with the content properly associated with a given list item:
50+
Taking the `Tabs` example from the previous section, Markdown for Agents will give us a normal unordered list with the content properly associated with a given list item:
5151

5252
```md
5353
- One
@@ -59,20 +59,25 @@ Taking the `Tabs` example from the previous section and running it through our p
5959
Two Content
6060
```
6161

62-
For example, take a look at our Markdown test fixture (or any page by appending `/index.md` to the URL):
62+
You can request any page as Markdown in two ways:
6363

64-
- [`/style-guide/fixtures/markdown/`](/style-guide/fixtures/markdown/)
65-
- [`/style-guide/fixtures/markdown/index.md`](/style-guide/fixtures/markdown/index.md)
64+
- Send a request with an `Accept: text/markdown` header:
65+
66+
```bash
67+
curl "https://developers.cloudflare.com/style-guide/ai-tooling/" \
68+
--header "Accept: text/markdown"
69+
```
70+
71+
- Append `index.md` to the URL — for example, [`/style-guide/ai-tooling/index.md`](/style-guide/ai-tooling/index.md)
6672

6773
### Saving on tokens
6874

69-
Most AI pricing is around input & output tokens and our approach greatly reduces the amount of input tokens required.
75+
Most AI pricing is around input & output tokens and Markdown greatly reduces the amount of input tokens required.
7076

7177
For example, let's take a look at the amount of tokens required for the [Workers Get Started](/workers/get-started/guide/) using [OpenAI's tokenizer](https://platform.openai.com/tokenizer):
7278

7379
- HTML: 15,229 tokens
74-
- turndown: 3,401 tokens (4.48x less than HTML)
75-
- index.md: 2,110 tokens (7.22x less than HTML)
80+
- Markdown: 2,110 tokens (7.22x less than HTML)
7681

7782
When providing our content to AI, we can see a real-world ~7x saving in input tokens cost.
7883

src/content/partials/style-guide/llms-txt.mdx

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,17 @@ We have implemented `llms.txt` and `llms-full.txt` as follows:
1111

1212
To obtain a Markdown version of a single documentation page, you can:
1313

14-
- Send a request to `/$page/index.md` — Add `/index.md` to the end of any page to get the Markdown version. For example, [`/style-guide/index.md`](/style-guide/index.md).
15-
16-
- Send a request to `/$page/` with an `Accept` header requesting a page in Markdown format — Uses [Markdown for Agents](/fundamentals/reference/markdown-for-agents/). For example:
14+
- Send a request to any page with an `Accept: text/markdown` header — Uses [Markdown for Agents](/fundamentals/reference/markdown-for-agents/) to convert the page to Markdown at the network layer. For example:
1715

1816
```bash
19-
curl "https://developers.cloudflare.com/style-guide/" \
17+
curl "https://developers.cloudflare.com/style-guide/ai-tooling/" \
2018
--header "Accept: text/markdown"
2119
```
2220

21+
- Send a request to `/$page/index.md` — Add `/index.md` to the end of any page to get the Markdown version. For example, [`/style-guide/ai-tooling/index.md`](/style-guide/ai-tooling/index.md).
22+
23+
Both methods return the same Markdown output, powered by [Markdown for Agents](/fundamentals/reference/markdown-for-agents/).
24+
2325
In the top right of this page, you will see a `Page options` button where you can copy the current page as Markdown that can be given to your LLM of choice.
2426

2527
<Width size="medium">

src/middleware/index.ts

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
import { defineMiddleware } from "astro:middleware";
2-
import { htmlToMarkdown } from "~/util/markdown";
32

43
// `astro dev` only middleware so that `/api/...` links can be viewed.
54
export const onRequest = defineMiddleware(async (context, next) => {
@@ -14,24 +13,6 @@ export const onRequest = defineMiddleware(async (context, next) => {
1413
"accept-encoding": "identity",
1514
},
1615
});
17-
} else if (
18-
pathname.endsWith("/index.md") ||
19-
context.request.headers.get("accept")?.includes("text/markdown")
20-
) {
21-
const htmlUrl = new URL(pathname.replace("index.md", ""), context.url);
22-
const html = await (await fetch(htmlUrl)).text();
23-
24-
const markdown = await htmlToMarkdown(html, context.url.toString());
25-
26-
if (!markdown) {
27-
return new Response("Not Found", { status: 404 });
28-
}
29-
30-
return new Response(markdown, {
31-
headers: {
32-
"content-type": "text/markdown; charset=utf-8",
33-
},
34-
});
3516
}
3617
}
3718

src/pages/[...product]/llms.txt.ts

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
import type { APIRoute, GetStaticPaths, InferGetStaticPropsType } from "astro";
2+
import { getCollection } from "astro:content";
3+
import dedent from "dedent";
4+
5+
export const getStaticPaths = (async () => {
6+
const directory = await getCollection("directory");
7+
8+
const docs = await getCollection("docs");
9+
10+
return directory
11+
.map((entry) => {
12+
const productUrl = entry.data.entry.url;
13+
// Derive route segments from the product's canonical URL.
14+
// e.g. /cloudflare-for-platforms/cloudflare-for-saas/ → ["cloudflare-for-platforms", "cloudflare-for-saas"]
15+
// e.g. /workers/ → ["workers"]
16+
// Skip the root URL "/" (home entry) and fragment-only anchors (e.g. /path/#anchor)
17+
if (!productUrl || productUrl === "/" || productUrl.includes("#"))
18+
return null;
19+
20+
const urlPath = productUrl.slice(1, -1); // strip leading/trailing slashes
21+
if (!urlPath) return null;
22+
23+
const prefix = urlPath;
24+
const pages = docs.filter(
25+
(e) => e.id.startsWith(prefix + "/") || e.id === prefix,
26+
);
27+
28+
if (pages.length === 0) return null;
29+
30+
return {
31+
params: { product: urlPath },
32+
props: { entry, pages },
33+
};
34+
})
35+
.filter((p) => p !== null);
36+
}) satisfies GetStaticPaths;
37+
38+
type Props = InferGetStaticPropsType<typeof getStaticPaths>;
39+
40+
type Page = InferGetStaticPropsType<typeof getStaticPaths>["pages"][number];
41+
42+
function formatPage(base: string, e: Page) {
43+
const line = `- [${e.data.title}](${base}/${e.id}/index.md)`;
44+
return e.data.description ? line.concat(`: ${e.data.description}`) : line;
45+
}
46+
47+
interface Section {
48+
label: string;
49+
order: number | undefined;
50+
indexPage: Page | undefined;
51+
children: Page[];
52+
}
53+
54+
function buildSections(prefix: string, pages: Page[]): Section[] | null {
55+
const childPages = pages.filter((e) => e.id !== prefix);
56+
57+
// Find section index pages: pages that are exactly one directory level
58+
// below the prefix and have children under them.
59+
// e.g., for prefix "workers", find "workers/get-started", "workers/configuration", etc.
60+
const sectionMap = new Map<string, Section>();
61+
62+
for (const page of childPages) {
63+
const relative = page.id.slice(prefix.length + 1); // e.g., "get-started" or "get-started/guide"
64+
const firstSegment = relative.split("/")[0];
65+
const sectionId = `${prefix}/${firstSegment}`;
66+
67+
if (!sectionMap.has(sectionId)) {
68+
sectionMap.set(sectionId, {
69+
label: firstSegment,
70+
order: undefined,
71+
indexPage: undefined,
72+
children: [],
73+
});
74+
}
75+
76+
const section = sectionMap.get(sectionId)!;
77+
78+
if (page.id === sectionId) {
79+
// This is the section index page
80+
section.indexPage = page;
81+
section.label = page.data.title;
82+
section.order = page.data.sidebar?.order;
83+
} else {
84+
section.children.push(page);
85+
}
86+
}
87+
88+
const sections = [...sectionMap.values()];
89+
90+
// Check if any sections have explicit sidebar ordering
91+
const hasOrdering = sections.some((s) => s.order !== undefined);
92+
if (!hasOrdering) return null;
93+
94+
// Sort sections: those with order first (by order), then those without (alphabetically by label)
95+
sections.sort((a, b) => {
96+
if (a.order !== undefined && b.order !== undefined)
97+
return a.order - b.order;
98+
if (a.order !== undefined) return -1;
99+
if (b.order !== undefined) return 1;
100+
return a.label.localeCompare(b.label);
101+
});
102+
103+
return sections;
104+
}
105+
106+
export const GET: APIRoute<Props> = async ({ props, url }) => {
107+
const base = url.origin;
108+
const { entry, pages } = props;
109+
const { title, url: productUrl } = entry.data.entry;
110+
const description = entry.data.meta?.description;
111+
112+
const prefix = productUrl.slice(1, -1);
113+
const rootPage = pages.find((e) => e.id === prefix);
114+
const rootLink = rootPage
115+
? formatPage(base, rootPage)
116+
: `- [${title}](${base}${productUrl}index.md)`;
117+
118+
const sections = buildSections(prefix, pages);
119+
120+
let pageContent: string;
121+
122+
if (sections) {
123+
// Grouped output with section headers
124+
pageContent = sections
125+
.map((section) => {
126+
const heading = `## ${section.label}`;
127+
const lines: string[] = [];
128+
if (section.indexPage) {
129+
lines.push(formatPage(base, section.indexPage));
130+
}
131+
for (const child of section.children) {
132+
lines.push(formatPage(base, child));
133+
}
134+
return `${heading}\n\n${lines.join("\n")}`;
135+
})
136+
.join("\n\n");
137+
} else {
138+
// Flat fallback
139+
const childPages = pages.filter((e) => e.id !== prefix);
140+
pageContent = childPages.map((e) => formatPage(base, e)).join("\n");
141+
}
142+
143+
const pagesSection = sections
144+
? `## Overview\n\n${rootLink}\n\n${pageContent}`
145+
: `## ${title} documentation pages\n\n${rootLink}\n\n${pageContent}`;
146+
147+
const markdown = dedent(`
148+
# ${title}
149+
150+
${description ?? ""}
151+
152+
> Links below point directly to Markdown versions of each page. Any page can also be retrieved as Markdown by sending an \`Accept: text/markdown\` header to the page's URL without the \`index.md\` suffix (for example, \`curl -H "Accept: text/markdown" ${base}${productUrl}\`).
153+
>
154+
> For other Cloudflare products, see the [Cloudflare documentation directory](${base}/llms.txt).
155+
>
156+
> Use [${title} llms-full.txt](${base}${productUrl}llms-full.txt) for the complete ${title} documentation in a single file, intended for offline indexing, bulk vectorization, or large-context models.
157+
158+
${pagesSection}
159+
`);
160+
161+
return new Response(markdown, {
162+
headers: {
163+
"content-type": "text/plain",
164+
},
165+
});
166+
};

src/pages/[product]/llms.txt.ts

Lines changed: 0 additions & 63 deletions
This file was deleted.

0 commit comments

Comments
 (0)