Lakera Launches Open-Source Security Benchmark for AI Agents

by:
|
October 29, 2025

As AI agents become more integral to enterprise operations, securing the large language models (LLMs) that power them is a critical and complex challenge. To address this, Lakera, in collaboration with Check Point Software Technologies and researchers from The UK AI Security Institute, has launched the backbone breaker benchmark (b3), an open-source security evaluation specifically designed for the LLMs within AI agents.

Quick Intel

Lakera and Check Point have launched the open-source "backbone breaker benchmark" (b3) for AI agent security.
The benchmark introduces "threat snapshots" to test LLMs at critical, vulnerable points in agent workflows.
It evaluates susceptibility to attacks like prompt exfiltration, code injection, and phishing.
Initial tests of 31 popular LLMs show enhanced reasoning improves security, not model size.
The benchmark uses a dataset from the "Gandalf: Agent Breaker" red teaming game.
The tool provides developers a realistic way to measure and improve their AI security posture.

A New Approach to AI Security Testing

The b3 benchmark is built around a novel concept called "threat snapshots." This methodology moves away from simulating an entire AI agent workflow and instead focuses on pinpointing the critical moments where vulnerabilities in the underlying LLM are most likely to be exploited. This allows for more efficient and realistic adversarial testing without the overhead of modeling a full agent.

Pinpointing Hidden Vulnerabilities

Mateo Rojas-Carulla, Co-Founder and Chief Scientist at Lakera, explained the benchmark's purpose, stating, “We built the b3 benchmark because today’s AI agents are only as secure as the LLMs that power them. Threat Snapshots allow us to systematically surface vulnerabilities that have until now remained hidden in complex agent workflows. By making this benchmark open to the world, we hope to equip developers and model providers with a realistic way to measure, and improve, their security posture.”

The benchmark combines 10 representative agent threat snapshots with a high-quality dataset of 19,433 crowdsourced adversarial attacks collected via the gamified red teaming game, Gandalf: Agent Breaker. It tests for critical vulnerabilities including system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service, and unauthorized tool calls.

Key Insights from Initial Model Testing

The initial release of the b3 benchmark includes revealing data from testing 31 popular LLMs. The results provide several key insights for the AI security community, indicating that enhanced reasoning capabilities significantly improve security, model size does not correlate with security performance, and while closed-source models generally outperform open-weight models, the gap is narrowing.

This launch represents a significant step towards standardizing security evaluations in the rapidly evolving field of Agentic AI. By providing an open-source tool, Lakera and its partners aim to equip the developer community with the means to build more robust and secure AI applications, fostering a proactive approach to identifying and mitigating risks in complex AI agent systems.

About Lakera

Lakera, a Check Point company, is a world leading AI-native security platform for Agentic AI applications, protecting Fortune 500 enterprises and leading technology companies from emerging AI cyber risks. Lakera’s defenses evolve in real-time thanks to Gandalf, the world’s largest red teaming community, and their proprietary AI. Lakera was founded by David Haber, Dr. Mateo Rojas-Carulla and Dr. Matthias Kraft in 2021, and was acquired by Check Point (NASDAQ: CHKP) in 2025. The company is dual-headquartered in Zurich and San Francisco.

About Check Point Software Technologies Ltd.

Check Point Software Technologies Ltd. is a leading protector of digital trust, utilizing AI-powered cyber security solutions to safeguard over 100,000 organizations globally. Through its Infinity Platform and an open garden ecosystem, Check Point’s prevention-first approach delivers industry-leading security efficacy while reducing risk. Employing a hybrid mesh network architecture with SASE at its core, the Infinity Platform unifies the management of on-premises, cloud, and workspace environments to offer flexibility, simplicity and scale for enterprises and service providers.

AI SecurityLLMAI AgentsOpen SourceCybersecurity

Join 110k+ Avid Tech Readers!

Trending tech news, interviews & insights straight to your inbox.

Lakera Launches Open-Source Security Benchmark for AI Agents

Quick Intel

A New Approach to AI Security Testing

Pinpointing Hidden Vulnerabilities

Key Insights from Initial Model Testing

About Lakera

About Check Point Software Technologies Ltd.

Join 110k+ Avid Tech Readers!

About Us

Quick Links

Connect With Us

Search TechIntelPro

Subscribe to Our Newsletter

Lakera Launches Open-Source Security Benchmark for AI Agents

Quick Intel

A New Approach to AI Security Testing

Pinpointing Hidden Vulnerabilities

Key Insights from Initial Model Testing

About Lakera

About Check Point Software Technologies Ltd.

Join 110k+ Avid Tech Readers!

About Us

Quick Links

Connect With Us