Cloudflare has launched its new Content Signals Policy, a free tool designed to give website owners and publishers more control over how their content is used by AI companies and other crawlers. The policy works by updating a website's robots.txt file, which is a standard file that provides instructions to web crawlers. While traditional robots.txt files only allow owners to specify which parts of their site crawlers can access, the new policy enables them to state how the content can be used after it has been accessed. This is a crucial distinction as the internet shifts from "search engines" that provide links to "answer engines" powered by AI that give direct answers without requiring users to visit the source site.
Cloudflare has launched a new Content Signals Policy to give website owners more control over their content's use.
The policy is a new addition to the robots.txt file, allowing owners to express preferences for how their content is used by AI crawlers.
This allows owners to opt out of uses like AI overviews and model training, addressing concerns about content being used for profit without attribution or traffic.
The policy defines clear signals: "yes" for permission, "no" for no permission, and no signal for an unexpressed preference.
It aims to strengthen communication between website owners and bot operators, hoping to encourage better respect for creator preferences.
The tool is available for free and will be automatically applied to Cloudflare customers who use its managed robots.txt service.
The rise of AI-powered "answer engines" poses a significant challenge to the traditional internet business model, where creators and publishers monetize content through website traffic. AI crawlers can scrape vast amounts of data, using it to provide direct answers and summaries without directing users back to the original source. To combat this, Cloudflare's Content Signals Policy provides a new, machine-readable way for website operators to express their preferences for how their data is used for commercial purposes, including AI input and AI training. This policy gives creators a stronger voice in how their intellectual property is leveraged.
The new policy integrates with a website’s existing robots.txt file. Cloudflare has published a clear set of instructions for crawlers, reminding them that these signals can have legal significance. The instructions are simple:
"Yes" means a specific use is allowed.
"No" means it is not allowed.
The absence of a signal means no preference is expressed.
Cloudflare is automatically updating the robots.txt files for millions of its customers with this new policy language. While the robots.txt file is not a technical barrier to unwanted scraping, this enhanced policy language aims to provide a clearer signal to bot operators, encouraging them to respect the preferences of content creators. The company believes that giving website owners this control is essential for ensuring the internet remains a thriving and open ecosystem.
Cloudflare, Inc. is the leading connectivity cloud company on a mission to help build a better Internet. It empowers organizations to make their employees, applications and networks faster and more secure everywhere, while reducing complexity and cost. Cloudflare’s connectivity cloud delivers the most full-featured, unified platform of cloud-native products and developer tools, so any organization can gain the control they need to work, develop, and accelerate their business.