VAST Data has announced a new AI inference architecture developed in collaboration with NVIDIA, specifically designed for the era of agentic and long-context AI. By running its VAST AI Operating System (AI OS) natively on NVIDIA BlueField-4 DPUs, the company aims to collapse legacy storage tiers and create a shared, high-performance key-value (KV) cache infrastructure that ensures deterministic access for complex, multi-turn, and multi-agent inference workloads.
VAST Data redesigns AI inference architecture with NVIDIA for agentic AI.
The VAST AI OS runs natively on NVIDIA BlueField-4 DPUs for embedded data services.
The solution creates a shared, pod-scale KV cache for long-context and multi-agent inference.
It targets the elimination of bottlenecks and GPU idle time as concurrency scales.
The architecture is built for the NVIDIA Inference Context Memory Storage Platform.
It aims to improve power efficiency and provide policy-driven context management for production AI.
The announcement addresses a fundamental shift in AI workloads: as inference evolves from simple prompts to persistent, multi-turn reasoning across agents, managing the context (KV cache) becomes as critical as raw GPU compute. Performance is increasingly governed by how efficiently this inference history can be stored, shared, and accessed. John Mao, Vice President of Global Technology Alliances at VAST Data, explained the paradigm shift, stating, “Inference is becoming a memory system, not a compute job. The winners won’t be the clusters with the most raw compute – they’ll be the ones that can move, share, and govern context at line rate.”
The new architecture embeds critical data services directly into the GPU server via BlueField-4 DPUs, removing traditional client-server contention and unnecessary data copies. This, combined with VAST's Disaggregated Shared-Everything (DASE) architecture, allows each host to access a globally coherent context namespace over high-speed RDMA fabrics. The goal is to eliminate bottlenecks that inflate time-to-first-token (TTFT) and cause GPU idle time as session concurrency increases, ensuring predictable performance at scale.
Beyond raw speed, the platform is designed to bring production-grade governance to inference. As AI moves into regulated, revenue-driving services, organizations need policy-driven control, isolation, auditability, and lifecycle management for context memory. VAST's AI OS delivers these data services as part of the infrastructure, helping customers avoid inefficiency and "rebuild storms" as context sizes explode.
Kevin Deierling, Senior Vice President of Networking at NVIDIA, highlighted the critical role of context, saying, "Multi-turn and multi-user inferencing fundamentally transforms how context memory is managed at scale. VAST Data AI OS with NVIDIA BlueField-4 enables the NVIDIA Inference Context Memory Storage Platform and a coherent data plane designed for sustained throughput and predictable performance as agentic workloads scale.”
This collaboration represents a significant architectural advancement for AI factories and enterprises deploying large-scale agentic AI, positioning shared, high-performance context memory as a fundamental infrastructure service for the next generation of intelligent systems.
About VAST Data
VAST Data is the AI Operating System company – powering the next generation of intelligent systems with a unified software infrastructure stack that was purpose-built to unlock the full potential of AI. The VAST AI OS consolidates foundational data and compute services and agentic execution into one scalable platform, enabling organizations to deploy and facilitate communication between AI agents, reason over real-time data, and automate complex workflows at global scale. Built on VAST’s breakthrough DASE architecture – the world’s first true parallel distributed system architecture that eliminates tradeoffs between performance, scale, simplicity, and resilience – VAST has transformed its modern infrastructure into a global fabric for reasoning AI.