Category: Account
Got something
Yes — Cloudflare AI Search can use a website as the resource for building its search index and context , but that’s not the same as training the underlying AI model on your site for general foundational learning . Here’s the nuance: ✅ Us...
Yes — Cloudflare AI Search can use a website as the resource for building its search index and context, but that’s not the same as training the underlying AI model on your site for general foundational learning. Here’s the nuance:

✅ Using a Website as a Data Source for AI Search
Cloudflare’s AI Search product lets you point the service at a website (typically one you control) and crawl and index the site content automatically so the AI Search service can respond to queries using that site’s information. It:
Crawls your site pages, downloads them into storage (an R2 bucket),
Converts them to text and creates embeddings,
Builds a searchable index (often called RAG — Retrieval-Augmented Generation),
Lets you query that index so answers are contextualized to your site content.
That means the AI reads your website content and uses it as the knowledge base for answering search queries, chat responses, or internal enterprise search tasks — essentially letting your site serve as part of the “training data for that index.”
❗But that is not the same as training/fine-tuning a model
Cloudflare does not use your website content to train or fine-tune the core large language model itself in the broader sense — e.g., adjusting the model’s weights across all domains. Instead, your content becomes part of the indexed knowledge store that the search system retrieves from at query time. This approach is usually called Retrieval-Augmented Generation (RAG), not foundational model training.
🔐 Control & Crawling Permissions
Also note:
You can set up crawlers and decide what parts of your site are indexed (including handling protected pages with custom headers).
Cloudflare provides tools (like AI Crawl Control and Content Signals / updated robots.txt rules) so you can manage whether the site is crawled by AI robots at all — and whether they’re allowed to use your content for training or indexing by outside parties.