You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Bots] Clarify managed robots.txt page for readability
Improve introduction, define jargon, and reduce assumptions on the
managed robots.txt reference page:
- Rewrite intro to establish why (AI crawlers scraping content) before
describing the feature
- State voluntary compliance prominently instead of burying it
- Define 'directives' inline on first use
- Define Content Signals in prose before the code block
- Replace 'Free zones' with 'domains on the Free plan'
- Clarify relationship between robots.txt and AI Crawl Control
- Fix grammar: 'a HTTP' to 'an HTTP'
- Remove unused Render import
Protect your website or application from AI crawlers by implementing a `robots.txt` file on your domain to direct AI bot operators on what content they can and cannot scrape for AI model training.
11
+
AI companies use crawlers to collect website content for training language models, generating search answers, and other purposes. A `robots.txt` file at the root of your domain tells these crawlers which content they should or should not access. When you turn on the managed `robots.txt` setting, Cloudflare generates and maintains a `robots.txt` file that instructs known AI crawlers to stay away from your content.
12
12
13
-
AI bots are expected to follow the `robots.txt` directives.
14
-
15
-
`robots.txt` files express your preferences. They do not prevent crawler operators from crawling your content at a technical level. Some crawler operators may disregard your `robots.txt` preferences and crawl your content regardless of what your `robots.txt` file says.
13
+
`robots.txt` compliance is voluntary. The file expresses your preferences, but it does not prevent crawlers from accessing your content at a technical level. Some crawler operators may disregard your `robots.txt` directives (instructions like `Disallow: /`) and crawl your content regardless.
16
14
17
15
:::note
18
-
Respecting `robots.txt` is voluntary. If you want to prevent crawling, use AI Crawl Control's [manage AI crawlers](/ai-crawl-control/features/manage-ai-crawlers/) feature.
16
+
If you want to enforce crawl blocking rather than request it, use [AI Crawl Control](/ai-crawl-control/features/manage-ai-crawlers/). You can also use both features together — `robots.txt` to express your preferences and AI Crawl Control to enforce them.
19
17
:::
20
18
21
19
## Compatibility with existing `robots.txt` files
22
20
23
-
Cloudflare will independently check whether your website has an existing `robots.txt` file and update the behavior of this feature based on your website.
21
+
Cloudflare detects whether your origin server already has a `robots.txt` file and adjusts accordingly — either merging with your existing file or creating one from scratch.
24
22
25
23
### Existing robots.txt file
26
24
27
-
If your website already has a `robots.txt` file — verified by a HTTP `200` response — Cloudflare will prepend our managed `robots.txt` before your existing `robots.txt`, combining both into a single response.
25
+
If your website already has a `robots.txt` file — verified by an HTTP `200` response — Cloudflare will prepend our managed `robots.txt` before your existing `robots.txt`, combining both into a single response.
28
26
29
27
For example, without this feature enabled, the `robots.txt` content of `crawlstop.com` would be:
If your website does not have a `robots.txt` file, Cloudflare creates a new file with our managed block directives and serves it for you.
107
+
If your website does not have a `robots.txt` file, Cloudflare creates a new file with managed `Disallow` rules for known AI crawlers and serves it for you.
110
108
111
109
## Implementation
112
110
@@ -139,9 +137,11 @@ To implement a `robots.txt` file on your domain:
139
137
140
138
## Content Signals Policy
141
139
142
-
Free zones that do not have their own `robots.txt` file and do not use the managed `robots.txt` feature will display the Content Signals Policy when a crawler requests the `robots.txt` file for your zone.
140
+
Content Signals are a set of machine-readable directives in a `robots.txt` file that categorize how crawlers may use your content. The three categories are `search` (building a search index), `ai-input` (feeding content into AI models for real-time answers), and `ai-train` (training or fine-tuning AI models).
141
+
142
+
Domains on the Free plan that do not have their own `robots.txt` file and do not use the managed `robots.txt` feature will display the Content Signals Policy when a crawler requests the `robots.txt` file for your domain.
143
143
144
-
This file only outlines the Content Signals framework. It does not express your preferences or rights associated with your content.
144
+
The Content Signals Policy defines these categories but does not express any specific preferences about your content. To set preferences (for example, `ai-train=no`), turn on the managed `robots.txt` feature.
145
145
146
146
```txt title="Content Signals Policy"
147
147
# As a condition of accessing this website, you agree to abide by the
0 commit comments