Runloop.ai, the leading enterprise infrastructure platform for AI agents, announced the launch of its Custom Benchmarks product on October 1, 2025. This offering allows organizations to create specialized, private benchmarks for evaluating and refining AI agents using proprietary codebases and business logic. In collaboration with Fermatix.ai, a full-cycle data generation specialist, Runloop.ai is conducting a landmark pilot to demonstrate the product's applications.
As AI agents transition from prototypes to production, generic benchmarks fall short for enterprise needs. Custom Benchmarks provide a secure platform for rigorous testing, data for model refinement, and performance metrics tailored to unique tasks. "As AI agents move from prototypes to production, the benchmarks we use to evaluate them must evolve from generic tests to strategic assets," said Jonathan Wall, CEO of Runloop.ai.
Fermatix.ai, known for expert-level training data with industry professionals as annotators, will leverage Runloop.ai's infrastructure for custom, multilingual benchmarks. This pilot moves beyond data labeling to reusable verification standards. "This partnership represents a strategic evolution—creating reusable benchmarks that deliver ongoing value," said Sergey Anchutin, CEO and Founder of Fermatix.ai.
Runloop.ai's Custom Benchmarks, combined with the Fermatix.ai pilot, empower enterprises to fine-tune AI agents for complex scenarios, ensuring reliability and efficiency.
Runloop provides infrastructure and tooling for building, testing, refining, and deploying AI agents at scale. Founded by engineers with deep experience in building large-scale systems, Runloop provides secure, isolated environments, rich developer tooling, and a suite of benchmarking capabilities that help companies deploy and manage AI agents with confidence.