Skip to content

Commit 511a187

Browse files
committed
[Bots] Clarify managed robots.txt page for readability
Improve introduction, define jargon, and reduce assumptions on the managed robots.txt reference page: - Rewrite intro to establish why (AI crawlers scraping content) before describing the feature - State voluntary compliance prominently instead of burying it - Define 'directives' inline on first use - Define Content Signals in prose before the code block - Replace 'Free zones' with 'domains on the Free plan' - Clarify relationship between robots.txt and AI Crawl Control - Fix grammar: 'a HTTP' to 'an HTTP' - Remove unused Render import
1 parent bc45489 commit 511a187

1 file changed

Lines changed: 11 additions & 11 deletions

File tree

src/content/docs/bots/additional-configurations/managed-robots-txt.mdx

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,25 +6,23 @@ sidebar:
66
label: robots.txt setting
77
---
88

9-
import { Render, Tabs, TabItem, Steps, DashButton } from "~/components";
9+
import { Tabs, TabItem, Steps, DashButton } from "~/components";
1010

11-
Protect your website or application from AI crawlers by implementing a `robots.txt` file on your domain to direct AI bot operators on what content they can and cannot scrape for AI model training.
11+
AI companies use crawlers to collect website content for training language models, generating search answers, and other purposes. A `robots.txt` file at the root of your domain tells these crawlers which content they should or should not access. When you turn on the managed `robots.txt` setting, Cloudflare generates and maintains a `robots.txt` file that instructs known AI crawlers to stay away from your content.
1212

13-
AI bots are expected to follow the `robots.txt` directives.
14-
15-
`robots.txt` files express your preferences. They do not prevent crawler operators from crawling your content at a technical level. Some crawler operators may disregard your `robots.txt` preferences and crawl your content regardless of what your `robots.txt` file says.
13+
`robots.txt` compliance is voluntary. The file expresses your preferences, but it does not prevent crawlers from accessing your content at a technical level. Some crawler operators may disregard your `robots.txt` directives (instructions like `Disallow: /`) and crawl your content regardless.
1614

1715
:::note
18-
Respecting `robots.txt` is voluntary. If you want to prevent crawling, use AI Crawl Control's [manage AI crawlers](/ai-crawl-control/features/manage-ai-crawlers/) feature.
16+
If you want to enforce crawl blocking rather than request it, use [AI Crawl Control](/ai-crawl-control/features/manage-ai-crawlers/). You can also use both features together — `robots.txt` to express your preferences and AI Crawl Control to enforce them.
1917
:::
2018

2119
## Compatibility with existing `robots.txt` files
2220

23-
Cloudflare will independently check whether your website has an existing `robots.txt` file and update the behavior of this feature based on your website.
21+
Cloudflare detects whether your origin server already has a `robots.txt` file and adjusts accordingly — either merging with your existing file or creating one from scratch.
2422

2523
### Existing robots.txt file
2624

27-
If your website already has a `robots.txt` file — verified by a HTTP `200` response — Cloudflare will prepend our managed `robots.txt` before your existing `robots.txt`, combining both into a single response.
25+
If your website already has a `robots.txt` file — verified by an HTTP `200` response — Cloudflare will prepend our managed `robots.txt` before your existing `robots.txt`, combining both into a single response.
2826

2927
For example, without this feature enabled, the `robots.txt` content of `crawlstop.com` would be:
3028

@@ -106,7 +104,7 @@ Sitemap: https://www.crawlstop.com/sitemap.xml
106104

107105
### No robots.txt file
108106

109-
If your website does not have a `robots.txt` file, Cloudflare creates a new file with our managed block directives and serves it for you.
107+
If your website does not have a `robots.txt` file, Cloudflare creates a new file with managed `Disallow` rules for known AI crawlers and serves it for you.
110108

111109
## Implementation
112110

@@ -139,9 +137,11 @@ To implement a `robots.txt` file on your domain:
139137

140138
## Content Signals Policy
141139

142-
Free zones that do not have their own `robots.txt` file and do not use the managed `robots.txt` feature will display the Content Signals Policy when a crawler requests the `robots.txt` file for your zone.
140+
Content Signals are a set of machine-readable directives in a `robots.txt` file that categorize how crawlers may use your content. The three categories are `search` (building a search index), `ai-input` (feeding content into AI models for real-time answers), and `ai-train` (training or fine-tuning AI models).
141+
142+
Domains on the Free plan that do not have their own `robots.txt` file and do not use the managed `robots.txt` feature will display the Content Signals Policy when a crawler requests the `robots.txt` file for your domain.
143143

144-
This file only outlines the Content Signals framework. It does not express your preferences or rights associated with your content.
144+
The Content Signals Policy defines these categories but does not express any specific preferences about your content. To set preferences (for example, `ai-train=no`), turn on the managed `robots.txt` feature.
145145

146146
```txt title="Content Signals Policy"
147147
# As a condition of accessing this website, you agree to abide by the

0 commit comments

Comments
 (0)