Home
News
Tech Grid
Interviews
Anecdotes
Think Stack
Press Releases
Articles
  • Enterprise AI

Cerebras Unveils Qwen3-235B: Fastest AI Model with 131K Context


Cerebras Unveils Qwen3-235B: Fastest AI Model with 131K Context
  • by: Business Wire
  • |
  • July 9, 2025

Cerebras Systems has launched Qwen3-235B, a frontier AI model with full 131K context support, on its Inference Cloud platform, announced at the RAISE Summit in Paris. Delivering unmatched speed and cost-efficiency, this model redefines enterprise AI deployment for coding, reasoning, and data-intensive workflows.

Quick Intel

  • Cerebras launches Qwen3-235B, world’s fastest frontier AI model, on Inference Cloud.

  • Achieves 1,500 tokens/second, reducing response times from minutes to 0.6 seconds.

  • Supports 131K context, enabling production-grade code generation and document analysis.

  • Costs $0.60/M input tokens, $1.20/M output tokens—1/10th of closed-source models.

  • Partners with Cline to enhance coding with 10-20x faster generation speeds.

  • Rivals Claude 4 Sonnet, Gemini 2.5 Flash, and DeepSeek R1 in performance.

Breakthrough Performance with Qwen3-235B

Cerebras’ Qwen3-235B, developed by Alibaba, leverages a mixture-of-experts architecture to deliver frontier-level intelligence, matching models like Claude 4 Sonnet and DeepSeek R1 across science, coding, and general knowledge benchmarks, per Artificial Analysis tests. Powered by Cerebras’ Wafer Scale Engine, it achieves 1,500 tokens per second, slashing response times from 1-2 minutes to 0.6 seconds for reasoning, coding, and deep-RAG workflows. “This is the first time a world-class reasoning model—on par with DeepSeek R1 and OpenAI’s o-series—can return answers instantly,” said Andrew Feldman, CEO and co-founder of Cerebras.

Expanded Context for Enterprise Applications

Cerebras has quadrupled its context length support from 32K to 131K tokens, the maximum for Qwen3-235B, enabling the model to process large codebases and complex documents. This capability supports production-grade application development, addressing the growing enterprise code generation market. “With Cerebras’ inference, developers using Cline are getting a glimpse of the future, as Cline reasons through problems, reads codebases, and writes code in near real-time,” said Saoud Rizwan, CEO of Cline. The platform’s efficiency allows Cerebras to offer Qwen3-235B at $0.60 per million input tokens and $1.20 per million output tokens, significantly undercutting closed-source alternatives.

Strategic Partnership with Cline

Cerebras has partnered with Cline, a leading coding agent for Microsoft VS Code with over 1.8 million installations, to showcase Qwen3-235B’s capabilities. Cline users can access Cerebras’ Qwen models, starting with Qwen3-32B at 64K context on the free tier, with plans to include Qwen3-235B at 131K context. This integration delivers 10-20x faster code generation compared to competitors like DeepSeek R1, enhancing developer productivity. “Everything happens so fast that developers stay in flow, iterating at the speed of thought. This kind of fast inference isn’t just nice to have—it shows us what’s possible when AI truly keeps pace with developers,” said Saoud Rizwan.

Cost-Efficient AI for Enterprises

Qwen3-235B offers enterprise-grade performance at a fraction of the cost of closed-source models, making it an attractive alternative to OpenAI and Anthropic. Available through AWS Marketplace, it integrates seamlessly with enterprise workflows, supporting applications from financial services to developer tools. Cerebras’ platform eliminates latency bottlenecks, enabling real-time AI for mission-critical tasks, positioning it as a leader in scalable, high-performance AI solutions.

Cerebras’ launch of Qwen3-235B sets a new benchmark for AI performance, combining frontier intelligence, instant responsiveness, and cost efficiency to empower enterprises in building next-generation AI applications.

 

About Cerebras Systems

Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to accelerate generative AI by building from the ground up a new class of AI supercomputer. Our flagship product, the CS-3 system, is powered by the world’s largest and fastest commercially available AI processor, our Wafer-Scale Engine-3. CS-3s are quickly and easily clustered together to make the largest AI supercomputers in the world, and make placing models on the supercomputers dead simple by avoiding the complexity of distributed computing. Cerebras Inference delivers breakthrough inference speeds, empowering customers to create cutting-edge AI applications. Leading corporations, research institutions, and governments use Cerebras solutions for the development of pathbreaking proprietary models, and to train open-source models with millions of downloads. Cerebras solutions are available through the Cerebras Cloud and on-premises.

News Disclaimer
  • Share