F5 has expanded its collaboration with NVIDIA to enhance AI inference economics through tighter integration of F5 BIG-IP Next for Kubernetes with NVIDIA BlueField-3 DPUs. This combination creates an intelligent, telemetry-aware infrastructure layer that significantly boosts token throughput, GPU utilization, and overall efficiency while enabling secure multi-tenant AI platforms at scale.
Enterprises and GPU-as-a-Service providers are transitioning from AI experimentation to revenue-generating inference services. Success now hinges on token economics—metrics such as sustained token throughput, time to first token, cost per token, and revenue per accelerator—rather than raw GPU capacity alone.
Traditional inference architectures often suffer from inefficient routing, queuing delays, and underutilized resources. The F5-NVIDIA solution addresses these by making infrastructure inference-aware. BIG-IP Next uses real-time telemetry to direct traffic intelligently before execution, reducing re-compute, latency, and waste while increasing GPU yield.
Independent testing by The Tolly Group confirmed substantial gains: 40% higher token throughput, 61% faster TTFT, and 34% lower overall latency. By offloading networking and security functions to BlueField-3 DPUs, the solution preserves CPU cycles and allows GPUs to focus on core inference tasks. These improvements require no changes to existing models, enabling immediate deployment across production AI factories.
Agent-driven workflows introduce persistent, context-rich interactions that demand smarter traffic control than conventional load balancing provides. The enhanced solution supports inference-aware routing tailored to agentic patterns, secure network-level multi-tenancy via dynamic VRFs, and simplified lifecycle management through NVIDIA DOCA integration.
“AI infrastructure is no longer just about access to GPU or scaling their deployments. It has evolved into maximizing economic output per accelerator,” said Kunal Anand, Chief Product Officer, F5. “Together with NVIDIA, we are enabling AI factories to treat token production as a measurable business metric. BIG-IP Next for Kubernetes provides the intelligence and governance required to increase GPU yield, reduce cost per token, and scale shared AI platforms confidently.”
“NVIDIA’s accelerated computing infrastructure coupled with F5’s AI-aware Application Delivery and Security Platform unlocks superior AI factory tokenomics—delivering scalable and cost-effective inference without making any changes to the models,” said Kevin Deierling, SVP, Networking, NVIDIA.
This joint advancement equips enterprises and NeoCloud providers with validated tools to optimize inference architecture, extract greater value from existing GPUs, lower operational costs, and build scalable, monetizable AI services ready for sustained agentic growth.
Supporting materials:
About F5
F5, Inc. is the global leader that delivers and secures every app. Backed by three decades of expertise, F5 has built the industry’s premier platform—F5 Application Delivery and Security Platform (ADSP)—to deliver and secure every app, every API, anywhere: on-premises, in the cloud, at the edge, and across hybrid, multicloud environments. F5 is committed to innovating and partnering with the world’s largest and most advanced organizations to deliver fast, available, and secure digital experiences. Together, we help each other thrive and bring a better digital world to life.