Penguin Solutions, Inc., the AI factory platform company, has expanded its OriginAI portfolio with new inference solutions designed to address enterprise-scale AI demands. By integrating large memory appliances with NVIDIA RTX PRO 6000 Blackwell Server Edition and NVIDIA B300 GPUs, these solutions tackle context size limitations, concurrency challenges, and low-latency requirements while improving GPU utilization, deployment speed, and infrastructure reliability.
As AI inference becomes the primary driver of enterprise value—demanding predictable, low-latency performance at scale—traditional GPU designs often face bottlenecks in memory capacity and KV cache management. Penguin’s OriginAI solutions shift focus from compute alone to holistic optimization, where memory availability directly influences latency, throughput, and user experience in real-world deployments.
Penguin’s MemoryAI KV cache server, built on CXL technology and compatible with NVIDIA Dynamo framework, expands KV cache capacity beyond standard GPU limits. This enables extended context lengths, higher concurrency, and cost-efficient inference for demanding applications, delivering measurable improvements in performance and economics for next-generation AI deployments.
The OriginAI platform incorporates ICE ClusterWare software, an advanced management layer that tunes hardware into optimized AI clusters. It provides continuous health monitoring, automatic remediation to sustain peak performance, and workload isolation for enhanced data security in shared environments.
RTX PRO 6000-based architectures suit enterprise copilots, RAG systems, code assistance, and document summarization, offering lower acquisition costs, flexible on-premises deployment, and power efficiency for mid-sized models. B300-based designs target enterprise-wide platforms, long-context assistants, frontier model hosting, and agentic workloads, providing massive memory bandwidth and scalability for shared services.
These inference solutions support critical use cases across sectors: Financial services benefit from real-time fraud detection and high-frequency trading with minimal latency for secure, optimized transactions. Healthcare gains from precise diagnostics, patient monitoring, and real-time translations where timely insights can be life-critical. Retail leverages personalization, inventory management, and agentic systems for enhanced customer engagement and operational efficiency.
“Penguin Solutions operationalizes and optimizes AI inferencing by delivering the performance, scalability, and reliability required to realize fully actionable insight and discovery,” said Phil Pokorny, chief technology officer at Penguin Solutions. “Organizations must understand the factors that impact inference performance—which differ significantly from training—to productize AI and deliver accurate and fast outcomes. Whether it’s for deep research or agentic applications, we optimize infrastructure for real-world workloads and enable organizations to turn AI innovation into measurable business outcomes.”
About Penguin Solutions
The most transformative technological advancements are often the hardest to deploy and optimize. Penguin Solutions, the AI factory platform company, has the innovative technologies, skills, experience, and partnerships needed to turn your AI ambitions into reality. In addition to our AI capabilities, Penguin Solutions offers memory and LED solutions serving a wide range of high-performance and specialized applications.