
Cerebras and Core42 have announced the global availability of OpenAI’s gpt-oss-120B, delivering record-breaking inference speeds of 3,000 tokens per second via Core42’s AI Cloud and Compass API, as revealed on August 28, 2025. This collaboration empowers enterprises, researchers, and governments with scalable, high-performance AI for real-time reasoning and agentic workloads.
Cerebras and Core42 launch OpenAI’s gpt-oss-120B globally on August 28, 2025.
Achieves 3,000 tokens/second, verified by Artificial Analysis, surpassing GPU providers.
Pricing: $0.25/M input tokens, $0.69/M output tokens, with 128K context support.
Powered by Cerebras’ CS-3 system and wafer-scale engine (WSE) for ultra-low latency.
Core42’s AI Cloud enables seamless integration for enterprise-scale AI applications.
Supports semantic search, code execution, automation, and decision intelligence.
Cerebras, the world’s fastest AI provider, and Core42, a G42 company specializing in sovereign cloud and AI infrastructure, have partnered to deliver OpenAI’s gpt-oss-120B at unmatched speeds. “Together with Cerebras and Core42, we’re making our best and most usable open model available at unprecedented speed and scale,” said Trevor Cai, Head of Infrastructure at OpenAI. The collaboration leverages Cerebras’ CS-3 system and wafer-scale engine (WSE), achieving 3,000 tokens per second—outpacing NVIDIA’s Blackwell DGX B200 (900 tokens/second) by over 3x in single-user tests.
Industry-Leading Speed: gpt-oss-120B delivers 3,000 tokens/second, enabling real-time applications like live coding assistants and instant document Q&A.
Long-Context Understanding: Supports 128K token context for complex, multi-turn reasoning.
Cost Efficiency: Priced at $0.25/M input tokens and $0.69/M output tokens, offering 8.4x price-performance advantage over median GPU clouds.
Seamless Integration: Core42’s Compass API allows instant API access, with no refactoring needed for OpenAI endpoint users.
Enterprise Scalability: Supports agentic AI for semantic search, code execution, and decision intelligence, scalable from experimentation to production.
“The latest chapter in our ongoing strategic partnership with Core42 now delivers the world’s most capable open-weight models directly into the hands of enterprises,” said Andrew Feldman, CEO of Cerebras. Core42’s AI Cloud ensures compliance and flexibility, while Cerebras’ WSE eliminates GPU bottlenecks, delivering ultra-low latency and deterministic performance. “By running OpenAI gpt-oss on Cerebras hardware within Core42’s AI Cloud, we are setting a new benchmark for performance, flexibility, and compliance,” said Kiril Evtimov, CEO of Core42.
The gpt-oss-120B model, a 120B-parameter mixture-of-experts with 128 experts across 36 layers, rivals proprietary models like Gemini 2.5 Flash and Claude Opus 4 in math, science, and coding tasks. Its Apache 2.0 license enables fine-tuning and on-premises deployment, critical for sensitive data. The partnership aligns with the growing demand for agentic AI, projected to drive $1 trillion in economic impact by 2030, offering enterprises unmatched speed and affordability.
Developers and enterprises can access gpt-oss-120B via Core42’s AI Cloud at https://aicloud.core42.ai or Cerebras Cloud with a free API key at cerebras.ai/openai. The platform supports high-throughput inference for workloads like reasoning and long-context generation, with pricing at $0.25/M input and $0.69/M output tokens.
Cerebras Systems, powered by its Wafer-Scale Engine-3 and CS-3 system, delivers the world’s fastest AI inference. Trusted by leading corporations and governments, Cerebras supports open-source models with millions of downloads, simplifying large-scale AI deployments.
Core42, a G42 company, provides sovereign cloud and AI infrastructure, empowering enterprises with scalable, compliant solutions. Its Compass API delivers high-performance AI capabilities globally.