Penguin Solutions Expands OriginAI Factory Platform for Optimized Enterprise AI Inference

by:
|
March 17, 2026

Penguin Solutions, Inc., the AI factory platform company, has expanded its OriginAI portfolio with new inference solutions designed to address enterprise-scale AI demands. By integrating large memory appliances with NVIDIA RTX PRO 6000 Blackwell Server Edition and NVIDIA B300 GPUs, these solutions tackle context size limitations, concurrency challenges, and low-latency requirements while improving GPU utilization, deployment speed, and infrastructure reliability.

Quick Intel

OriginAI inference solutions combine NVIDIA RTX PRO 6000 and B300 GPUs with expanded memory to overcome context size and concurrency constraints.
Incorporates Penguin’s CXL-based MemoryAI KV cache server for low-latency, high-concurrency inference and extended context lengths.
Leverages over 3.3 billion hours of GPU runtime experience and 30+ years of advanced memory expertise for production-grade performance.
Includes ICE ClusterWare software for intelligent cluster management, health monitoring, auto-remediation, and secure multi-tenant isolation.
Offers RTX PRO 6000-based configurations for cost-efficient mid-sized models and B300-based for large-scale, long-context, and agentic workloads.
Targets industries like financial services, healthcare, and retail requiring ultra-low latency for real-time AI applications.

As AI inference becomes the primary driver of enterprise value—demanding predictable, low-latency performance at scale—traditional GPU designs often face bottlenecks in memory capacity and KV cache management. Penguin’s OriginAI solutions shift focus from compute alone to holistic optimization, where memory availability directly influences latency, throughput, and user experience in real-world deployments.

MemoryAI KV Cache Server Enhances Scalability

Penguin’s MemoryAI KV cache server, built on CXL technology and compatible with NVIDIA Dynamo framework, expands KV cache capacity beyond standard GPU limits. This enables extended context lengths, higher concurrency, and cost-efficient inference for demanding applications, delivering measurable improvements in performance and economics for next-generation AI deployments.

Intelligent Management with ICE ClusterWare

The OriginAI platform incorporates ICE ClusterWare software, an advanced management layer that tunes hardware into optimized AI clusters. It provides continuous health monitoring, automatic remediation to sustain peak performance, and workload isolation for enhanced data security in shared environments.

Tailored Configurations for Enterprise Needs

RTX PRO 6000-based architectures suit enterprise copilots, RAG systems, code assistance, and document summarization, offering lower acquisition costs, flexible on-premises deployment, and power efficiency for mid-sized models. B300-based designs target enterprise-wide platforms, long-context assistants, frontier model hosting, and agentic workloads, providing massive memory bandwidth and scalability for shared services.

Industry Applications Requiring Ultra-Low Latency

These inference solutions support critical use cases across sectors: Financial services benefit from real-time fraud detection and high-frequency trading with minimal latency for secure, optimized transactions. Healthcare gains from precise diagnostics, patient monitoring, and real-time translations where timely insights can be life-critical. Retail leverages personalization, inventory management, and agentic systems for enhanced customer engagement and operational efficiency.

Leadership Perspective

“Penguin Solutions operationalizes and optimizes AI inferencing by delivering the performance, scalability, and reliability required to realize fully actionable insight and discovery,” said Phil Pokorny, chief technology officer at Penguin Solutions. “Organizations must understand the factors that impact inference performance—which differ significantly from training—to productize AI and deliver accurate and fast outcomes. Whether it’s for deep research or agentic applications, we optimize infrastructure for real-world workloads and enable organizations to turn AI innovation into measurable business outcomes.”

About Penguin Solutions

The most transformative technological advancements are often the hardest to deploy and optimize. Penguin Solutions, the AI factory platform company, has the innovative technologies, skills, experience, and partnerships needed to turn your AI ambitions into reality. In addition to our AI capabilities, Penguin Solutions offers memory and LED solutions serving a wide range of high-performance and specialized applications.

AI InferenceAgentic AIEnterprise AI

Share

Join 30,000+ Avid Tech Readers!

Trending tech news, interviews & insights straight to your inbox.

I agree to the Privacy Policy terms

Penguin Solutions Expands OriginAI Factory Platform for Optimized Enterprise AI Inference

Quick Intel

MemoryAI KV Cache Server Enhances Scalability

Intelligent Management with ICE ClusterWare

Tailored Configurations for Enterprise Needs

Industry Applications Requiring Ultra-Low Latency

Leadership Perspective

Join 30,000+ Avid Tech Readers!

About Us

Quick Links

Connect With Us

Search TechIntelPro

Subscribe to Our Newsletter

Penguin Solutions Expands OriginAI Factory Platform for Optimized Enterprise AI Inference

Quick Intel

MemoryAI KV Cache Server Enhances Scalability

Intelligent Management with ICE ClusterWare

Tailored Configurations for Enterprise Needs

Industry Applications Requiring Ultra-Low Latency

Leadership Perspective

Join 30,000+ Avid Tech Readers!

About Us

Quick Links

Connect With Us