Zyphra has officially announced the release of ZAYA1-8B, a mixture-of-experts (MoE) language model designed to deliver high-performance reasoning while maintaining a small computational footprint. Developed on Zyphra’s AMD-native training stack, the model achieves competitive results against significantly larger open-weight and proprietary models in specialized domains such as mathematics and coding. The launch underscores a shift toward "intelligence density," where model efficiency is prioritized to maximize output per active parameter.
ZAYA1-8B is a mixture-of-experts model with fewer than one billion active parameters.
The model matches or exceeds the performance of much larger models like Mistral-Small-4-119B and Nemotron-3-Nano-30B-A3B.
Zyphra introduced Markovian RSA, a test-time compute methodology that enables unbounded reasoning with constant memory costs.
Training was conducted on AMD Instinct MI300X clusters with AMD Pensando Pollara networking on IBM Cloud.
The architecture utilizes Compressed Convolutional Attention (CCA) and a novel MLP-based expert router for stability.
ZAYA1-8B is released under an Apache 2.0 license and is available on Hugging Face and Zyphra Cloud.
Despite its compact size, ZAYA1-8B demonstrates significant capabilities across rigorous benchmarks, including AIME and HMMT for mathematics, and LiveCodeBench for programming. By leveraging high intelligence density per parameter, the model remains competitive with first-generation frontier reasoning models such as Gemini-2.5-Pro and DeepSeek-R1-0528. This performance is further enhanced by the new Markovian RSA methodology, which allows ZAYA1-8B to approach or surpass models like Claude 4.5 Sonnet on specific mathematical benchmarks when utilizing extended compute.
The technical foundation of ZAYA1-8B involves several key innovations aimed at optimizing the full AI stack. The model incorporates Compressed Convolutional Attention (CCA) and learned residual scaling to manage growth through depth without increasing FLOP costs. Its post-training process follows a four-stage reinforcement learning (RL) cascade, which includes reasoning warmups, difficulty-based curriculums, and large-scale math and code RL.
"ZAYA1-8B demonstrates what is possible when architecture, pretraining, and reinforcement learning are co-designed toward a single objective: maximizing the intelligence extracted per parameter and per FLOP," said Krithik Puthalath, Founder and CEO of Zyphra. "This is the foundation of how we think about building efficient, scalable AI systems, and we are excited to continue scaling both model size and the breadth of domains our post-training stack covers."
The pretraining of ZAYA1-8B was executed entirely on AMD hardware, utilizing 1,024 MI300X GPUs. This infrastructure foundation also powers the Zyphra Cloud environment. To support the open-source community and enterprise adoption, Zyphra has made the model weights available on Hugging Face under the Apache 2.0 license. Users can also access the model as a serverless endpoint through the company's dedicated cloud platform.
The release of ZAYA1-8B highlights the increasing viability of smaller, highly optimized models in handling complex enterprise tasks that were previously reserved for massive, resource-heavy LLMs.
About Zyphra
Zyphra is an open superintelligence research and product company based in San Francisco, CA, on a mission to build human-aligned AI that helps individuals and organizations reach their fullest potential.