
PEAK:AIO has introduced a groundbreaking Unified Token Memory Feature to address AI memory bottlenecks for large language model (LLM) inference and model innovation. This platform unifies KVCache acceleration and GPU memory expansion, tackling persistent infrastructure challenges in AI workloads with a memory-centric approach.
Launched on May 19, 2025, PEAK:AIO’s Unified Token Memory Feature resolves KVCache inefficiency and GPU memory saturation, critical for scaling transformer models. “Whether deploying agents or scaling to million-token context windows, this appliance treats token history as memory,” said Eyal Lemberger, Chief AI Strategist at PEAK:AIO. With memory demands exceeding 500GB per model, it offers 150 GB/sec throughput and sub-5 microsecond latency via CXL memory and Gen5 NVMe.
Unlike NVMe-based storage, PEAK:AIO’s architecture aligns with NVIDIA’s KVCache reuse and memory reclaim models, supporting TensorRT-LLM and Triton for seamless inference acceleration. It enables KVCache reuse across sessions, context-window expansion, and GPU memory offload through CXL tiering. “We built infrastructure that behaves like memory,” Lemberger noted, highlighting its RAM-like token access in microseconds, essential for dynamic AI workloads.
Leveraging GPUDirect RDMA and NVMe-oF, the platform ensures ultra-low latency for real-time inference, agentic systems, and model creation. “Big vendors stack NVMe to fake memory. We used CXL for true memory semantics,” said Mark Klarzynski, Chief Strategy Officer. Trusted in healthcare, pharmaceutical, and enterprise AI, the solution uses off-the-shelf servers and is slated for production by Q3 2025, offering scalability and ease of integration.
PEAK:AIO’s software-defined platform supports million-token context windows and long-running agents, with early access available at sales@peakaio.com. By treating token memory as infrastructure, it eliminates traditional storage limitations, enabling enterprises to innovate in AI model development. This positions PEAK:AIO to meet the growing demands of AI-driven industries with unmatched efficiency.
PEAK:AIO’s Unified Token Memory Feature redefines AI infrastructure by eliminating memory bottlenecks for LLMs. Its CXL-driven, low-latency architecture empowers scalable, efficient AI workloads, establishing PEAK:AIO as a leader in next-generation AI data solutions.
PEAK:AIO is a software-first infrastructure company delivering next-generation AI data solutions. Trusted across global healthcare, pharmaceutical, and enterprise AI deployments, PEAK:AIO powers real-time, low-latency inference and training with memory-class performance, RDMA acceleration, and zero-maintenance deployment models.