Skip to content

Commit 156405d

Browse files
authored
AI Search Docs Refactor (#28290)
* random space * small changes * limit change
1 parent 028aa77 commit 156405d

18 files changed

Lines changed: 72 additions & 280 deletions

src/content/docs/ai-search/concepts/how-ai-search-works.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ sidebar:
55
order: 2
66
---
77

8-
AI Search (formerly AutoRAG) is Cloudflare’s managed search service. You can connect your data such as websites or unstructured content, and it automatically creates a continuously updating index that you can query with natural language in your applications or AI agents.
8+
AI Search is Cloudflare’s managed search service. You can connect your data such as websites or unstructured content, and it automatically creates a continuously updating index that you can query with natural language in your applications or AI agents.
99

1010
AI Search consists of two core processes:
1111

src/content/docs/ai-search/configuration/chunking.mdx

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,7 @@ This way, chunks are easy to embed and retrieve, without cutting off thoughts mi
2020

2121
AI Search exposes two parameters to help you control chunking behavior:
2222

23-
- **Chunk size**: The number of tokens per chunk.
24-
- Minimum: `64`
25-
- Maximum: `512`
23+
- **Chunk size**: The number of tokens per chunk. The option range may vary depending on the model.
2624
- **Chunk overlap**: The percentage of overlapping tokens between adjacent chunks.
2725
- Minimum: `0%`
2826
- Maximum: `30%`
@@ -33,16 +31,6 @@ These settings apply during the indexing step, before your data is embedded and
3331

3432
Chunking affects both how your content is retrieved and how much context is passed into the generation model. Try out this external [chunk visualizer tool](https://huggingface.co/spaces/m-ric/chunk_visualizer) to help understand how different chunk settings could look.
3533

36-
For chunk size, consider how:
37-
38-
- **Smaller chunks** create more precise vector matches, but may split relevant ideas across multiple chunks.
39-
- **Larger chunks** retain more context, but may dilute relevance and reduce retrieval precision.
40-
41-
For chunk overlap, consider how:
42-
43-
- **More overlap** helps preserve continuity across boundaries, especially in flowing or narrative content.
44-
- **Less overlap** reduces indexing time and cost, but can miss context if key terms are split between chunks.
45-
4634
### Additional considerations:
4735

4836
- **Vector index size:** Smaller chunk sizes produce more chunks and more total vectors. Refer to the [Vectorize limits](/vectorize/platform/limits/) to ensure your configuration stays within the maximum allowed vectors per index.

src/content/docs/ai-search/configuration/data-source/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ sidebar:
55
order: 2
66
---
77

8-
AI Search can directly ingest data from the following sources:
8+
AI Search can directly ingest data from the following sources:
99

1010
| Data Source | Description |
1111
|---------------|-------------|

src/content/docs/ai-search/configuration/data-source/r2.mdx

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,7 @@ Refer to [Path filtering](/ai-search/configuration/path-filtering/) for pattern
2424

2525
## File limits
2626

27-
AI Search has different file size limits depending on the file type:
28-
29-
- **Plain text files:** Up to **4 MB**
30-
- **Rich format files:** Up to **4 MB**
27+
AI Search has a file size limit of **up to **4 MB**.
3128

3229
Files that exceed these limits will not be indexed and will show up in the error logs.
3330

src/content/docs/ai-search/configuration/data-source/website.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -146,9 +146,9 @@ If you have Security rules configured to block bot activity, you can add a rule
146146

147147
You can configure parsing options during onboarding or in your instance settings under **Parser options**.
148148

149-
### Sitemap
149+
### Specific sitemap
150150

151-
By default, AI Search crawls all sitemaps listed in your `robots.txt` in the order they appear (top to bottom). If you do not want the crawler to index everything, you can specify a single sitemap URL to limit which pages are crawled.
151+
By default, AI Search crawls all sitemaps listed in your `robots.txt` in the order they appear (top to bottom). If you do not want the crawler to index everything, you can specify a single sitemap URL to limit which pages are crawled. You can add up to 5 specific sitemaps.
152152

153153
### Rendering mode
154154

@@ -157,7 +157,7 @@ You can choose how pages are parsed during crawling:
157157
- **Static sites**: Downloads the raw HTML for each page.
158158
- **Rendered sites**: Loads pages with a headless browser and downloads the fully rendered version, including dynamic JavaScript content. Note that the [Browser Rendering](/browser-rendering/pricing/) limits and billing apply.
159159

160-
## Access protected content
160+
## Extra headers for access protected content
161161

162162
If your website has pages behind authentication or are only visible to logged-in users, you can configure custom HTTP headers to allow the AI Search crawler to access this protected content. You can add up to five custom HTTP headers to the requests AI Search sends when crawling your site.
163163

src/content/docs/ai-search/configuration/models/index.mdx

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,7 @@ All AI Search instances support models from [Workers AI](/workers-ai). You can u
2222

2323
To use AI Search with other model providers:
2424

25-
1. Add provider keys to AI Gateway
26-
- Go to **AI > AI Gateway** in the dashboard.
27-
- Select or create an AI gateway.
28-
- In **Provider Keys**, choose your provider, click **Add**, and enter the key.
25+
1. Add provider keys to [AI Gateway](/ai-gateway/configuration/bring-your-own-keys/)
2926
2. Connect the gateway to AI Search
3027
- When creating a new AI Search, select the AI Gateway with your provider keys.
3128
- For an existing AI Search, go to **Settings** and switch to a gateway that has your keys under **Resources**.

src/content/docs/ai-search/configuration/path-filtering.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Path filtering works with both [website](/ai-search/configuration/data-source/we
1313

1414
You can configure path filters when creating or editing an AI Search instance. In the dashboard, open **Path Filters** and add your include or exclude rules. You can also update path filters at any time from the **Settings** page of your instance.
1515

16-
When using the API, specify `include_items` and `exclude_items` in the `source_params` of your configuration:
16+
When using the REST API, specify `include_items` and `exclude_items` in the `source_params` of your configuration:
1717

1818
| Parameter | Type | Limit | Description |
1919
| --------------- | ---------- | ------------------- | -------------------------------------------------------- |

src/content/docs/ai-search/configuration/reranking.mdx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,3 +72,6 @@ To update reranking for an existing instance:
7272
4. Under **Reranking**, toggle reranking on.
7373
5. Select the reranking model.
7474

75+
### Considerations
76+
77+
Adding reranking will include an additional step to the query request, as a result, there may be an increase in the latency of the request.

src/content/docs/ai-search/configuration/service-api-token.mdx

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,16 @@ Service API tokens are required during the AI Search beta. This requirement may
1515

1616
When you create an AI Search instance, it needs to interact with other Cloudflare services on your behalf, such as [R2](/r2/), [Vectorize](/vectorize/), and [Workers AI](/workers-ai/). The service API token authorizes AI Search to perform these operations. Without it, AI Search cannot index your data or respond to queries.
1717

18+
This token requires the AI Search Index Engine permission (`9e9b428a0bcd46fd80e580b46a69963c`) which grants access to run AI Search Index Engine.
19+
20+
1821
## Service API token vs. AI Search API token
1922

2023
AI Search uses two types of API tokens for different purposes:
2124

2225
| Token type | Purpose | Who uses it | When to create |
2326
| ------------------- | ------------------------------------------------------------------- | -------------------- | ------------------------------------------------ |
24-
| Service API token | Grants AI Search permission to access R2, Vectorize, and Workers AI | AI Search (internal) | Once per account, during first instance creation |
27+
| Service API token | Grants AI Search permission to access R2, Vectorize, Browser Rendering and Workers AI | AI Search (internal) | Once per account, during first instance creation |
2528
| AI Search API token | Authenticates your requests to query or manage AI Search instances | You (external) | When calling the AI Search REST API |
2629

2730
The **service API token** is used internally by AI Search to perform background operations like indexing your content and generating responses. You create it once and AI Search uses it automatically.

src/content/docs/ai-search/configuration/system-prompt.mdx

Lines changed: 33 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -58,39 +58,6 @@ The system prompt for your AI Search can be set after it has been created:
5858
3. Go to the **Settings** tab.
5959
4. Go to **Query rewrite** or **Generation**, and edit the **System prompt**.
6060

61-
## Query rewriting system prompt
62-
63-
If query rewriting is enabled, you can provide a custom system prompt to control how the model rewrites user queries. In this step, the model receives:
64-
65-
- The query rewrite system prompt
66-
- The original user query
67-
68-
The model outputs a rewritten query optimized for semantic retrieval.
69-
70-
### Example
71-
72-
```text
73-
You are a search query optimizer for vector database searches. Your task is to reformulate user queries into more effective search terms.
74-
75-
Given a user's search query, you must:
76-
1. Identify the core concepts and intent
77-
2. Add relevant synonyms and related terms
78-
3. Remove irrelevant filler words
79-
4. Structure the query to emphasize key terms
80-
5. Include technical or domain-specific terminology if applicable
81-
82-
Provide only the optimized search query without any explanations, greetings, or additional commentary.
83-
84-
Example input: "how to fix a bike tire that's gone flat"
85-
Example output: "bicycle tire repair puncture fix patch inflate maintenance flat tire inner tube replacement"
86-
87-
Constraints:
88-
- Output only the enhanced search terms
89-
- Keep focus on searchable concepts
90-
- Include both specific and general related terms
91-
- Maintain all important meaning from original query
92-
```
93-
9461
## Generation system prompt
9562

9663
If you are using the AI Search API endpoint, you can use the system prompt to influence how the LLM responds to the final user query using the retrieved results. At this step, the model receives:
@@ -128,3 +95,36 @@ Important:
12895
- If documents contradict each other, note this and explain your reasoning for the chosen answer
12996
- Do not repeat the instructions
13097
```
98+
99+
## Query rewriting system prompt
100+
101+
If query rewriting is enabled, you can provide a custom system prompt to control how the model rewrites user queries. In this step, the model receives:
102+
103+
- The query rewrite system prompt
104+
- The original user query
105+
106+
The model outputs a rewritten query optimized for semantic retrieval.
107+
108+
### Example
109+
110+
```text
111+
You are a search query optimizer for vector database searches. Your task is to reformulate user queries into more effective search terms.
112+
113+
Given a user's search query, you must:
114+
1. Identify the core concepts and intent
115+
2. Add relevant synonyms and related terms
116+
3. Remove irrelevant filler words
117+
4. Structure the query to emphasize key terms
118+
5. Include technical or domain-specific terminology if applicable
119+
120+
Provide only the optimized search query without any explanations, greetings, or additional commentary.
121+
122+
Example input: "how to fix a bike tire that's gone flat"
123+
Example output: "bicycle tire repair puncture fix patch inflate maintenance flat tire inner tube replacement"
124+
125+
Constraints:
126+
- Output only the enhanced search terms
127+
- Keep focus on searchable concepts
128+
- Include both specific and general related terms
129+
- Maintain all important meaning from original query
130+
```

0 commit comments

Comments
 (0)