Red Hat has announced an expanded collaboration with Amazon Web Services to enhance enterprise generative AI capabilities on AWS infrastructure. The collaboration focuses on enabling Red Hat's AI software, including the Red Hat AI Inference Server, to run on AWS's custom AI silicon—Trainium and Inferentia chips—aiming to deliver greater choice, efficiency, and cost-effectiveness for production AI inference workloads.
Red Hat expands collaboration with AWS to run its AI software on AWS Trainium and Inferentia AI chips.
Red Hat AI Inference Server (powered by vLLM) will support AWS AI chips, targeting 30-40% better price/performance vs. comparable GPUs.
Red Hat and AWS developed an AWS Neuron operator for Red Hat OpenShift, OpenShift AI, and OpenShift Service on AWS.
The companies are co-developing an AWS AI chip plugin for the open-source vLLM project.
Support aims to provide a common inference layer for any generative AI model on optimized hardware.
Red Hat AI Inference Server support for AWS chips is expected in developer preview in January 2026.
A central goal of the collaboration is to improve the economics of scaling generative AI. By enabling the Red Hat AI Inference Server to utilize AWS's purpose-built AI chips, the companies aim to deliver up to 30-40% better price performance compared to current GPU-based Amazon EC2 instances. This addresses a critical pain point as enterprises move AI projects from experimentation to costly production-scale inference.
To simplify the operational path for customers, Red Hat and AWS have co-developed an AWS Neuron operator for the Red Hat OpenShift family of platforms, including OpenShift AI and the managed OpenShift Service on AWS. This operator provides a supported method for deploying and managing AI workloads that leverage AWS accelerators within the familiar Kubernetes-based OpenShift environment, reducing integration complexity.
The collaboration extends into the open-source ecosystem. As a top commercial contributor to the vLLM project, Red Hat is working with AWS to upstream an AWS AI chip plugin to vLLM. vLLM is the engine behind Red Hat's AI Inference Server and the llm-d project, which focuses on large-scale inference. This ensures that optimizations for AWS silicon benefit the broader open-source community.
This move aligns with Red Hat's strategy of providing a flexible, hybrid cloud platform for AI that is agnostic to underlying hardware. By supporting AWS's custom silicon, Red Hat offers its customers more choice to match their AI workloads with the most efficient and cost-effective infrastructure, whether it's traditional CPUs, GPUs, or purpose-built AI accelerators like Trainium and Inferentia.
This expanded partnership strengthens the full-stack AI proposition on AWS, combining Red Hat's enterprise-grade AI software and platform management with AWS's cloud infrastructure and custom silicon, aiming to help organizations achieve optimized, scalable, and governed AI outcomes.
About Red Hat, Inc.
Red Hat is the open hybrid cloud technology leader, delivering a trusted, consistent and comprehensive foundation for transformative IT innovation and AI applications. Its portfolio of cloud, developer, AI, Linux, automation and application platform technologies enables any application, anywhere—from the datacenter to the edge. As the world's leading provider of enterprise open source software solutions, Red Hat invests in open ecosystems and communities to solve tomorrow's IT challenges. Collaborating with partners and customers, Red Hat helps them build, connect, automate, secure and manage their IT environments, supported by consulting services and award-winning training and certification offerings.