cloudflare
diff --git a/‎src/content/changelog/workers-ai/2026-03-04-new-markdown-conversion-options.mdx‎
Lines changed: 45 additions & 0 deletions b/‎src/content/changelog/workers-ai/2026-03-04-new-markdown-conversion-options.mdx‎
Lines changed: 45 additions & 0 deletions
diff --git a/‎src/content/docs/workers-ai/features/markdown-conversion.mdx‎
Lines changed: 0 additions & 213 deletions b/‎src/content/docs/workers-ai/features/markdown-conversion.mdx‎
Lines changed: 0 additions & 213 deletions
diff --git a/‎src/content/docs/workers-ai/features/markdown-conversion/conversion-options.mdx‎
Lines changed: 85 additions & 0 deletions b/‎src/content/docs/workers-ai/features/markdown-conversion/conversion-options.mdx‎
Lines changed: 85 additions & 0 deletions
diff --git a/‎src/content/docs/workers-ai/features/markdown-conversion/how-it-works.mdx‎
Lines changed: 51 additions & 0 deletions b/‎src/content/docs/workers-ai/features/markdown-conversion/how-it-works.mdx‎
Lines changed: 51 additions & 0 deletions
@@ -0,0 +1,45 @@
+---
+title: New conversion options for Markdown Conversion
+description: Control how images, HTML, and PDFs are processed when converting to Markdown
+date: 2026-03-04
+---
+
+import { Badge, MetaInfo, Render, TypeScriptExample } from "~/components";
+
+You can now customize how the [Markdown Conversion](/workers-ai/features/markdown-conversion/) service processes different file types by passing a `conversionOptions` object.
+
+Available options:
+
+- **Images**: Set the language for AI-generated image descriptions
+- **HTML**: Use CSS selectors to extract specific content, or provide a hostname to resolve relative links
+- **PDF**: Exclude metadata from the output
+
+Use the [`env.AI`](/workers-ai/features/markdown-conversion/usage/binding/) binding:
+
+<TypeScriptExample>
+
+
+```typescript
+await env.AI.toMarkdown(
+	{ name: "page.html", blob: new Blob([html]) },
+	{
+		conversionOptions: {
+			html: { cssSelector: "article.content" },
+			image: { descriptionLanguage: "es" },
+		},
+	},
+);
+```
+
+</TypeScriptExample>
+
+Or call the REST API:
+
+```bash
+curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/tomarkdown \
+  -H 'Authorization: Bearer {API_TOKEN}' \
+  -F 'files=@index.html' \
+  -F 'conversionOptions={"html": {"cssSelector": "article.content"}}'
+```
+
+For more details, refer to [Conversion Options](/workers-ai/features/markdown-conversion/conversion-options/).
@@ -0,0 +1,85 @@
+---
+title: Conversion Options
+pcx_content_type: reference
+sidebar:
+  order: 5
+---
+
+By default, the `toMarkdown` service extracts text content from your files. To further extend the capabilities of the conversion process, you can pass options to the service to control how specific file types are converted.
+
+Options are organized by file type and are all optional.
+
+## Available options
+
+### Images
+
+```typescript
+{
+  image?: {
+    descriptionLanguage?: 'en' | 'it' | 'de' | 'es' | 'fr' | 'pt';
+  }
+}
+```
+
+- `descriptionLanguage`: controls the language of the AI-generated image descriptions.
+
+:::caution
+
+This option works on a _best-effort_ basis: it is not guaranteed that the resulting text will be in the desired language.
+
+:::
+
+### HTML
+
+```typescript
+{
+  html?: {
+    hostname?: string;
+    cssSelector?: string;
+  }
+}
+```
+
+- `hostname`: string to use as a host when resolving relative links inside the HTML.
+
+- `cssSelector`: string containing a CSS selector pattern to pick specific elements from your HTML. Refer to [how HTML is processed](/workers-ai/features/markdown-conversion/how-it-works/#html) for more details.
+
+### PDF
+
+```typescript
+{
+  pdf?: {
+    metadata?: boolean;
+  }
+}
+```
+
+- `metadata`: Previously, all converted PDF files always included metadata information when converted. This option allows you to opt-out of this behavior.
+
+## Examples
+
+### Binding
+
+To configure custom options, pass a `conversionOptions` object inside the second argument of the binding call, like this:
+
+```typescript
+await env.AI.toMarkdown(..., {
+  conversionOptions: {
+    html: { ... },
+    pdf: { ... },
+    ...
+   }
+})
+```
+
+### REST API
+
+Since the REST API uses file uploads, the request's `Content-Type` will be `multipart/form-data`. As such, include a new form field with your stringified object as a value:
+
+```bash
+curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/tomarkdown \
+  -X POST \
+  -H 'Authorization: Bearer {API_TOKEN}' \
+  ...
+  -F 'conversionOptions={ "html": { ... }, ... }'
+```
@@ -0,0 +1,51 @@
+---
+title: How it works
+pcx_content_type: concept
+sidebar:
+  order: 4
+---
+
+## Pre-processing
+
+When parsing files before converting them to Markdown, there are some cleanup tasks we do depending on the type of file you are trying to convert.
+
+### HTML
+
+When we detect an HTML file, a series of things happen to the HTML content before it is converted:
+
+- Some elements are ignored, including `script` and `style` tags.
+- Meta tags are extracted. These include `title`, `description`, `og:title`, `og:description` and `og:image`.
+- [JSON-LD](https://json-ld.org/) content is extracted, if it exists. This will be appended at the end of the converted markdown.
+- The base URL to use for resolving relative links is extracted from the `<base>` element<sup>1</sup>, if it exists, according to the spec (that is, only the first instance of the base URL is counted).
+- If the `cssSelector` option is:
+  - present, then only those elements that match the selector are kept for further processing;
+  - missing, then elements such as `<header>`, `<footer>` and `<head>` are removed from the text.
+- If a base URL was obtained previously, relative links in the remaining HTML are resolved to fully qualified URLs
+
+<sup>1</sup> The host can also be set per request, using the HTML conversion
+options. Refer to [Conversion
+Options](/workers-ai/features/markdown-conversion/conversion-options/#html) for
+more details.
+
+### Images
+
+Images take a bit more work to prepare for conversion.
+
+As a first step, we detect what type the image is. If it is an SVG (Scalable Vector Graphics) file, we need to convert it into a raster format so that using the necessary Workers AI models does not fail. In this case, SVGs are converted into PNGs internally.
+
+Afterwards:
+
+- We try to determine the image's dimensions. If successful, we determine if the image is considered "too big" or not. An image is "too big" if its width is bigger than 1280px or its height is bigger than 720px.
+- If the image is too big, we try to resize it to conform with those dimensions. If resizing fails, we simply try to use the original image data
+- The image is sent to an **object-detection model**. Specifically, we use the [`@cf/facebook/detr-resnet-50`](/workers-ai/models/detr-resnet-50/) from Workers AI.
+- If any objects were detected in the previous step, they are appended to a prompt that is used to instruct an **image-to-text model** on how to describe the image.
+- If a preferred conversion language is specified in the request's conversion options, the previous prompt is enriched with a directive for the model to output the content in the desired language. Refer to [Conversion Options](/workers-ai/features/markdown-conversion/conversion-options/#images) for more details.
+- The final prompt is sent, along with the image data, to the [`@cf/google/gemma-3-12b-it`](/workers-ai/models/gemma-3-12b-it/) model, also from Workers AI.
+
+### PDFs
+
+- Metadata is extracted. This can be removed from the final result. Refer to [Conversion Options](/workers-ai/features/markdown-conversion/conversion-options/#pdf) for more details.
+- Each page is parsed in sequence.
+- We try to obtain a `StructTree` object from the PDF file. This data structure is a tree of tagged elements that make up the PDF contents, as specified by [ISO 14289 (PDF/UA)](https://www.iso.org/standard/64599.html).
+- If none is obtained, we extract the text of the page _as-is_ and return it.
+- If we manage to obtain a `StructTree`, we traverse its nodes to build a semantic Markdown representation of its contents.