Home
News
Tech Grid
Interviews
Anecdotes
Think Stack
Press Releases
Articles
  • AI

Bugcrowd Launches RL Environments for AI Security Model Training


Bugcrowd Launches RL Environments for AI Security Model Training
  • by: PR Newswire
  • |
  • May 22, 2026

Bugcrowd, the leader in preemptive cybersecurity, has launched Reinforcement Learning Environments, a new offering that gives AI developers the infrastructure to train models on real software vulnerabilities rather than synthetic approximations. Built on technology acquired through Bugcrowd's purchase of Mayhem Security, the product is available immediately and is already in active use by leading large language model providers working to build more security-capable AI systems. The launch addresses a fundamental gap in how AI security models are currently trained and positions Bugcrowd as a key infrastructure provider for the frontier AI development community.

Quick Intel

  • Bugcrowd has launched Reinforcement Learning Environments, enabling AI developers to train models on real software vulnerabilities across the full cycle of finding, exploiting, and fixing security flaws.
  • The product is built on technology from Bugcrowd's acquisition of Mayhem Security and is already being used by leading LLM providers.
  • The platform includes hundreds of thousands of training environments built from authentic open-source vulnerabilities with real source code and verifiable outcomes.
  • All environments are derived exclusively from open-source software, with no customer data or security researchers involved at any stage of the training process.
  • The offering eliminates years of engineering effort typically required to build training infrastructure of this caliber, giving frontier AI teams immediate access to enterprise-grade environments.
  • Bugcrowd's RL Environments cover detection, exploitation, patching, and audit, addressing the full scope of real-world security reasoning rather than stopping at vulnerability detection alone.

Closing the Gap Between AI Security Training and Real-World Vulnerabilities

The central problem Bugcrowd's RL Environments are designed to solve is one that has quietly undermined the credibility of AI security models: the reliance on synthetic training data that does not reflect how real vulnerabilities actually behave. Models trained on approximations of security problems can perform well in controlled test environments while struggling significantly when confronted with actual software flaws in production conditions. The gap between training environment and deployment reality is where security breaks down.

"The gap between what AI agents are trained on and what they encounter in the real world is where security breaks down," said Dave Gerry, Chief Executive Officer at Bugcrowd. "Our RL Environments give frontier teams the infrastructure to build AI that learns security from real vulnerabilities, not approximations of them."

The distinction matters because identifying and exploiting vulnerabilities requires a set of specialized, sequenced skills that synthetic data cannot adequately replicate. Locating a bug, triggering it, assessing its exploitability, and then fixing it without breaking the underlying application are fundamentally different tasks that compound in complexity. Effective AI security training must address all of them.

How Reinforcement Learning Environments Work in Practice

Bugcrowd's RL Environments operate on the core premise of reinforcement learning: agents improve through cycles of action and feedback. Rather than reading about security problems in a static dataset, agents work directly with real, vulnerable software, attempting to find bugs, exploit them, and fix them. At each stage, the agent receives immediate, scored feedback on its performance, and the model improves through that iterative loop.

The platform provides hundreds of thousands of ready-to-use training environments, each built from authentic open-source vulnerabilities with real source code and verifiable outcomes. No additional infrastructure setup is required, meaning frontier AI teams can begin training immediately without the multi-year engineering investment that building comparable infrastructure from scratch would typically demand. All environments are derived exclusively from open-source software, and no customer data or security researchers are involved at any point in the training process.

Training That Goes Beyond Detection to Full Security Reasoning

One of the defining characteristics of Bugcrowd's offering is its scope. Most existing AI security training approaches focus on vulnerability detection and stop there. Bugcrowd's RL Environments extend the training cycle through exploitation, patching, and audit, covering the full chain of skills that constitute genuine security competence.

"Most AI security training stops too early. Models learn to find bugs, but not to prove the bugs are real and exploitable. You cannot train a model to be good at security by showing it what security looks like, you have to give it real problems to solve and honest feedback on whether it solved them. At Bugcrowd, we have spent years building the environments, graders, and reward structures that take models further, from detection through exploitation, patching, and audit. That is what real security skill looks like, and it is what we are making available to frontier AI teams today," said Dr. David Brumley, Chief AI and Science Officer at Bugcrowd.

This breadth of coverage is significant for LLM providers and frontier AI research teams that need agents capable of end-to-end security reasoning. A model that can detect a vulnerability but cannot assess its exploitability or apply a verified fix has limited practical utility in real-world security operations, and Bugcrowd's training framework is structured to address that limitation directly.

Built on the Mayhem Security Acquisition

The launch of RL Environments extends the technical foundation Bugcrowd established through its acquisition of Mayhem Security, which brought autonomous code and API testing capabilities into the platform. That acquisition positioned Bugcrowd to move upstream in the AI security infrastructure stack, and RL Environments represent the next step in that trajectory, giving frontier AI labs the training infrastructure to build security-aware agents at scale rather than solely testing agents against existing codebases.

The product is designed specifically for large language model providers and frontier AI research teams that need to develop agents capable of real-world security reasoning without investing years in building training infrastructure themselves, making it a direct response to the acceleration imperative that defines the current phase of frontier model development.

The launch of Bugcrowd's Reinforcement Learning Environments arrives at a moment when the security capabilities of AI models are under increasing scrutiny from both enterprise buyers and the broader research community. As AI agents take on more active roles in software development and security operations, the quality of the training data and environments shaping their security reasoning will have direct consequences for the reliability and safety of the systems they support. By providing training infrastructure grounded in real vulnerabilities, verifiable outcomes, and full-cycle security reasoning, Bugcrowd is positioning itself at a critical juncture in how the next generation of security-capable AI models gets built.

 

About Bugcrowd

Bugcrowd is the preemptive security platform that unifies exposure discovery and assessment, offensive testing, and intelligence shaped by AI and human insight to help organizations discover, validate, and reduce real-world risk. Bugcrowd helps security teams move faster by identifying the exposures that matter most so they can act first and stay ahead of attackers. By combining the power of humans and AI, teams can preempt attack paths and prevent breaches.

  • Cyber SecurityAI SecurityFrontier AI
News Disclaimer
  • Share