NVIDIA has unveiled a new class of AI-native storage infrastructure powered by its BlueField-4 data processor at CES 2026. The NVIDIA Inference Context Memory Storage Platform is specifically designed to handle the vast context data generated by large-scale, agentic AI systems. This solution aims to overcome bottlenecks in real-time, multi-agent inference by extending GPU memory capacity and enabling high-speed sharing of context across AI clusters, delivering up to 5x greater tokens per second and power efficiency compared to traditional storage.
NVIDIA introduces the Inference Context Memory Storage Platform, a new AI-native storage infrastructure.
It is powered by the NVIDIA BlueField-4 data processor and designed for gigascale agentic AI inference.
The platform manages the Key-Value (KV) cache critical for long-context, multi-turn AI agent memory.
It promises up to 5x greater tokens per second and 5x improved power efficiency versus traditional storage.
Key capabilities include accelerated KV cache sharing across clusters via NVIDIA Spectrum-X Ethernet.
Major storage vendors like Dell, HPE, IBM, and Pure Storage are building solutions with BlueField-4, available H2 2026.
As AI systems evolve into intelligent, multi-step collaborators, they generate and rely on massive amounts of contextual data, known as a Key-Value (KV) cache. Storing this essential, growing cache directly on GPUs creates a significant bottleneck for real-time inference and multi-agent systems. NVIDIA's new platform is engineered to provide a dedicated, scalable storage layer for this context memory, freeing GPU resources and enabling efficient data sharing across entire AI clusters to maintain continuity and accuracy in agentic workflows.
The announcement underscores a fundamental shift in infrastructure requirements driven by advanced AI. Jensen Huang, founder and CEO of NVIDIA, stated, “AI is revolutionizing the entire computing stack — and now, storage. AI is no longer about one-shot chatbots but intelligent collaborators that understand the physical world, reason over long horizons, stay grounded in facts, use tools to do real work, and retain both short- and long-term memory. With BlueField-4, NVIDIA and our software and hardware partners are reinventing the storage stack for the next frontier of AI.”
The BlueField-4-powered platform is built as a full-stack solution. It uses hardware-accelerated KV cache placement managed by BlueField-4 to eliminate metadata overhead and ensure secure, isolated access. Tight integration with the NVIDIA DOCA framework, NIXL library, and Dynamo software accelerates data sharing across nodes. The platform leverages NVIDIA Spectrum-X Ethernet as a high-performance fabric for RDMA-based access, enabling low-latency retrieval and high-bandwidth sharing of context memory across rack-scale systems, which is critical for improving time-to-first-token and multi-turn responsiveness.
The platform has garnered immediate support from a broad ecosystem of storage innovators, including AIC, Cloudian, DDN, Dell Technologies, HPE, Hitachi Vantara, IBM, Nutanix, Pure Storage, Supermicro, VAST Data, and WEKA. These partners are expected to build next-generation AI storage solutions incorporating BlueField-4, with availability slated for the second half of 2026. This collaboration points to the platform's role in defining a new standard for infrastructure capable of supporting the next frontier of long-context, agentic AI inference at scale.
About NVIDIA
NVIDIA is the world leader in AI and accelerated computing.