Home
News
Tech Grid
Interviews
Anecdotes
Think Stack
Press Releases
Articles
  • Home
  • /
  • News
  • /
  • AI
  • /
  • Agentic AI
  • /
  • F5 and NVIDIA Optimize AI Inference with BlueField-Accelerated BIG-IP Next
  • Agentic AI

F5 and NVIDIA Optimize AI Inference with BlueField-Accelerated BIG-IP Next


F5 and NVIDIA Optimize AI Inference with BlueField-Accelerated BIG-IP Next
  • by: Source Logo
  • |
  • March 18, 2026

F5 has expanded its collaboration with NVIDIA to enhance AI inference economics through tighter integration of F5 BIG-IP Next for Kubernetes with NVIDIA BlueField-3 DPUs. This combination creates an intelligent, telemetry-aware infrastructure layer that significantly boosts token throughput, GPU utilization, and overall efficiency while enabling secure multi-tenant AI platforms at scale.

Quick Intel

  • BIG-IP Next for Kubernetes leverages NVIDIA NIM statistics, Dynamo runtime signals, and GPU telemetry for inference-aware routing, matching workloads to optimal accelerators in real time.
  • Validated by The Tolly Group: up to 40% increase in token throughput, 61% faster time to first token (TTFT), and 34% reduction in request latency without model modifications.
  • Offloads networking, TLS/encryption, AI-aware load balancing, and traffic management to BlueField-3 DPUs, freeing host CPU and GPU resources for sustained high-throughput inference.
  • Enables secure multi-tenancy with EVPN-VXLAN dynamic VRFs, inference-aware routing for agentic workflows, and integration with NVIDIA DOCA Platform Framework for simplified DPU management.
  • Positions BIG-IP Next as a strategic control plane for AI factory economics, optimizing token consumption, reducing cost per token, and maximizing revenue per GPU accelerator.
  • Supports the shift to agent-driven, persistent, context-aware AI workloads with performance isolation and predictable SLAs in shared GPU environments.

Enterprises and GPU-as-a-Service providers are transitioning from AI experimentation to revenue-generating inference services. Success now hinges on token economics—metrics such as sustained token throughput, time to first token, cost per token, and revenue per accelerator—rather than raw GPU capacity alone.

Traditional inference architectures often suffer from inefficient routing, queuing delays, and underutilized resources. The F5-NVIDIA solution addresses these by making infrastructure inference-aware. BIG-IP Next uses real-time telemetry to direct traffic intelligently before execution, reducing re-compute, latency, and waste while increasing GPU yield.

Validated Performance and Economic Uplift

Independent testing by The Tolly Group confirmed substantial gains: 40% higher token throughput, 61% faster TTFT, and 34% lower overall latency. By offloading networking and security functions to BlueField-3 DPUs, the solution preserves CPU cycles and allows GPUs to focus on core inference tasks. These improvements require no changes to existing models, enabling immediate deployment across production AI factories.

Built for Agentic and Multi-Tenant AI

Agent-driven workflows introduce persistent, context-rich interactions that demand smarter traffic control than conventional load balancing provides. The enhanced solution supports inference-aware routing tailored to agentic patterns, secure network-level multi-tenancy via dynamic VRFs, and simplified lifecycle management through NVIDIA DOCA integration.

“AI infrastructure is no longer just about access to GPU or scaling their deployments. It has evolved into maximizing economic output per accelerator,” said Kunal Anand, Chief Product Officer, F5. “Together with NVIDIA, we are enabling AI factories to treat token production as a measurable business metric. BIG-IP Next for Kubernetes provides the intelligence and governance required to increase GPU yield, reduce cost per token, and scale shared AI platforms confidently.”

“NVIDIA’s accelerated computing infrastructure coupled with F5’s AI-aware Application Delivery and Security Platform unlocks superior AI factory tokenomics—delivering scalable and cost-effective inference without making any changes to the models,” said Kevin Deierling, SVP, Networking, NVIDIA.

This joint advancement equips enterprises and NeoCloud providers with validated tools to optimize inference architecture, extract greater value from existing GPUs, lower operational costs, and build scalable, monetizable AI services ready for sustained agentic growth.

Supporting materials:

  • Blog: AI factories need intelligent infrastructure. New results from The Tolly Group show why.
  • Report: Independent testing by Tolly: F5 BIG-IP Next for Kubernetes

About F5

F5, Inc. is the global leader that delivers and secures every app. Backed by three decades of expertise, F5 has built the industry’s premier platform—F5 Application Delivery and Security Platform (ADSP)—to deliver and secure every app, every API, anywhere: on-premises, in the cloud, at the edge, and across hybrid, multicloud environments. F5 is committed to innovating and partnering with the world’s largest and most advanced organizations to deliver fast, available, and secure digital experiences. Together, we help each other thrive and bring a better digital world to life.

  • AI FactoryTokenomicsAgentic AI
News Disclaimer
  • Share