Quesma BinaryAudit Tests AI for Binary Security

by:
|
February 17, 2026

Quesma, Inc. has introduced BinaryAudit, an independent, open-source benchmark designed to evaluate whether leading AI models can identify hidden malicious threats in software binaries before they are exploited. Developed in collaboration with world-class reverse engineer Michał "Redford" Kowalczyk, the benchmark reveals both encouraging potential and clear current limitations in AI-powered binary analysis for supply-chain security.

Quick Intel

Quesma announces BinaryAudit, an open-source benchmark testing AI's ability to detect threats in software binaries.
Best-performing model, Claude Opus 4.6, achieved only 49% success rate in identifying malicious code.
AI frequently flagged safe software as dangerous, underscoring accuracy challenges.
Benchmark addresses rising supply-chain attacks, including Notepad++, Shai Hulud 2.0, XZ Utils, and vendor-inserted vulnerabilities.
AI could shift binary reverse engineering from reactive, specialist-only process to proactive, scalable defense.
Available now at https://quesma.com/benchmarks/binaryaudit/ to track and drive progress in AI binary analysis.

Supply-chain attacks continue to inflict significant damage across industries. Recent high-profile incidents include state-sponsored hijacking of Notepad++ binaries, the Shai Hulud 2.0 campaign compromising thousands of organizations including Fortune 500 companies and governments to steal credentials, and the XZ Utils backdoor inserted by a long-term contributor who gained ownership access. Additional risks stem from vendor-side issues, such as manufacturer-planted code used to disable trains and hardcoded credentials discovered in Cisco devices. These known cases represent only a portion of the broader threat landscape.

The Role of AI in Binary Analysis

Traditional binary reverse engineering remains a reactive, resource-intensive technique reserved for a limited number of specialists and typically performed only after a breach or major incident. AI offers the possibility to transform this approach into a proactive security layer, enabling organizations to inspect binaries routinely—before deployment, during updates, prior to procurement, or even years after initial release. This shift could make supply-chain security more preventive and scalable.

“We were genuinely surprised that today’s LLMs can detect malicious code at all. At current performance levels, it’s an assistant, not a solution,” said Jacek Migdał, CEO of Quesma. “AI binary analysis could be a new layer of defence in supply-chain security. We hope new AI models released in the next 1-2 years will make binary analysis go mainstream. BinaryAudit helps to track and encourage progress in this field.”

Benchmark Insights and Future Potential

BinaryAudit provides a standardized way to measure AI performance in detecting hidden threats within binaries. While current frontier models demonstrate some capability to identify malicious patterns, the 49% success rate of the top performer—coupled with frequent false positives—indicates that AI remains an assistive tool rather than a standalone solution. The benchmark is publicly available to foster ongoing development and improvement in this emerging area of cybersecurity.

About Quesma

Quesma is a technological company that evaluates and tests advanced AI models. It creates benchmarks to evaluate how frontier LLMs perform across critical domains, such as DevOps, security, and database migrations. Quesma is backed by Heartcore Capital, Inovo, Firestreak Ventures, and several angels, including Christina Beedgen, co-founder of Sumo Logic.

Supply Chain SecurityCybersecurityAI Security

Share

Join 30,000+ Avid Tech Readers!

Trending tech news, interviews & insights straight to your inbox.

I agree to the Privacy Policy terms

Quesma BinaryAudit Tests AI for Binary Security

Quick Intel

The Role of AI in Binary Analysis

Benchmark Insights and Future Potential

Join 30,000+ Avid Tech Readers!

About Us

Quick Links

Connect With Us

Search TechIntelPro

Subscribe to Our Newsletter

Quesma BinaryAudit Tests AI for Binary Security

Quick Intel

The Role of AI in Binary Analysis

Benchmark Insights and Future Potential

Join 30,000+ Avid Tech Readers!

About Us

Quick Links

Connect With Us