Home
News
Tech Grid
Interviews
Anecdotes
Think Stack
Press Releases
Articles
  • Home
  • /
  • News
  • /
  • AI
  • /
  • Agentic AI
  • /
  • DryRun Security Research: Claude Generates Most Unresolved Security Flaws in AI-Built Apps
  • Agentic AI

DryRun Security Research: Claude Generates Most Unresolved Security Flaws in AI-Built Apps


DryRun Security Research: Claude Generates Most Unresolved Security Flaws in AI-Built Apps
  • by: Source Logo
  • |
  • March 11, 2026

DryRun Security, the industry's first AI-native, code security intelligence company, has released The Agentic Coding Security Report, new research examining how leading AI coding agents perform when building real applications. The study found that while AI coding agents significantly accelerate software development, they also consistently introduce security vulnerabilities during the development process, with Anthropic's Claude producing the highest number of unresolved high-severity security flaws in the final applications.

Quick Intel

  • DryRun Security released The Agentic Coding Security Report, evaluating Claude, Codex, and Gemini building two full applications through sequential pull requests.

  • 26 of 30 pull requests (87%) introduced at least one vulnerability, with 143 security issues identified across 38 security scans.

  • Claude produced the highest number of unresolved high-severity vulnerabilities in final applications.

  • Codex finished with the fewest vulnerabilities and demonstrated stronger remediation behavior during development.

  • Four authentication-related weaknesses appeared in every final codebase: insecure JWT verification, lack of brute force protections, token replay vulnerability, and insecure refresh token defaults.

  • None of the agents produced a fully secure application, with same vulnerability classes recurring across all agents.

AI Coding Agents: Fast but Not Secure by Default

The study evaluated three leading coding agents—Claude, Codex, and Gemini—as they developed two full applications through sequential pull requests, mirroring how real engineering teams implement features over time. Across the study, 26 of 30 pull requests (87%) introduced at least one vulnerability, with 143 security issues identified across 38 security scans. The same vulnerability classes appeared repeatedly across all agents, and none of the agents produced a fully secure application.

"AI coding agents can produce working software at incredible speed, but security isn't part of their default thinking," said James Wickett, CEO of DryRun Security. "In our usage and experience, AI coding agents often missed adding security components or created authentication logic flaws. These mistakes and gaps are exactly where attackers win."

Claude Leads in Unresolved High-Severity Vulnerabilities

While all three agents introduced security flaws during development, the study showed clear differences in their final security posture. Claude produced the highest number of unresolved high-severity vulnerabilities in the final applications. Codex ultimately finished with the fewest vulnerabilities and demonstrated stronger remediation behavior during development. Gemini introduced multiple issues early in its work and, interestingly, as it continued, it ended up removing some issues with later modifications. However, it still ended with several high-severity findings. Despite these differences, no agent produced a fully secure application.

Recurring Security Failures Across Every Codebase

Several vulnerability classes appeared consistently across both applications and all agents, many aligned with the OWASP Top 10. Four weaknesses appeared in every final codebase, all related to authentication:

  • Insecure JWT verification and management

  • Lack of application-level brute force protections

  • Open to token replay attacks

  • Insecure defaults for refresh token cookie configurations

In multiple cases, agents implemented security mechanisms but failed to apply them consistently across the system. For example, authentication middleware was created for REST APIs but never applied to WebSocket endpoints, leaving parts of the application exposed.

The Need for Continuous Security Review in Agentic Development

For the study, DryRun designed two applications—a web app to track family allergies and a browser-based racing game—and had each agent build features incrementally through pull requests, much like real-life agentic development. Each change was analyzed with DryRun Security before the next feature was implemented, followed by a full DeepScan of the final codebases. The results show that security risk accumulates quickly during agent-driven development if code is not reviewed continuously and remediated as part of the process.

DryRun Security's Contextual Security Analysis evaluates how applications behave in context, allowing teams to identify the systemic security gaps introduced by AI-generated code.

About DryRun Security

DryRun Security is the industry's first AI-native, agentic code security intelligence solution. Powered by its proprietary Contextual Security Analysis engine, DryRun Security helps security and developer teams reduce noise, surface real risk, and secure modern software built by both humans and autonomous agents. DryRun Security saves organizations thousands of hours otherwise spent on false positives, manual triage, and reactive reviews, while enabling security to scale with the speed and complexity of AI-driven development.

  • AI SecurityAgentic AICyber Security
News Disclaimer
  • Share