What does it take to deliver the right recommendation to hundreds of millions instantly?
Nikolai Savushkin says it’s a mix of scientific rigor, architectural discipline, and relentless curiosity.
In this conversation, he walks us through Yandex’s RecSys evolution: from billion-parameter transformers and massive real-world datasets to generative recommendation models and user-centric design principles. Expect sharp insights on scaling, ethics, experimentation, and the future of hyper-contextual personalization.
Yandex’s recommender systems operate on one of the world’s largest digital ecosystems, serving hundreds of millions daily. Balancing research innovation with reliability demands a dual-track framework: continuous experimentation alongside a tightly engineered production layer. New technologies, like our ARGUS transformer, are first validated through rigorous A/B testing in select services. This disciplined process ensures that research advances in algorithms, embedding methods, and contextual personalization translate into measurable user benefits while maintaining system stability and performance across our diverse platforms.
The adoption of transformer-based architectures marked a pivotal shift in Yandex’s recommendation accuracy. For a long time, it seemed that the recommended models had a size ceiling. However, our research showed that each increase in model size—up to 1 billion parameters—led to a consistent improvement in quality, following a predictable scaling law. Transformer models, trained on extensive sequential data, enabled a deeper understanding of user intent and behavior over time. This architecture captured long-range dependencies—how past actions influence future interest—while maintaining computational efficiency. Once internal metrics demonstrated substantial improvements in retrieval and ranking precision, transformers became foundational to Yandex’s personalization pipeline. The result is a scalable, context-aware recommendation engine capable of delivering precise, adaptive results across heterogeneous user bases.
We focus on developing core technological breakthroughs that can be adapted to different services. For instance, the ARGUS architecture is designed as a powerful, general-purpose user model. Its ability to understand long-term user preferences from sequential data is a universal benefit. We then implement this core technology as a feature within the existing ranking stack of each service, such as music streaming or marketplace. This allows each vertical to leverage the power of a large transformer while preserving the specific nuances and optimizations it requires.
We believe that accuracy and user satisfaction are fundamentally linked. Our focus is on building systems that understand user intent deeply and help them discover new content. For example, a key success metric for ARGUS was its improved performance in 'novelty/discovery' scenarios, which naturally promotes diversity. Furthermore, we provide users with controls that allow them to guide recommendations. Our primary ethical commitment is to build transparent and controllable systems that earn user trust through relevance and utility.
Performance metrics—such as a 20% increase in target user actions—offer tangible validation, but true success in personalization extends beyond numbers. The primary objective is relevance that feels effortless: moments when users encounter precisely what they need or enjoy, without conscious search. System success is measured through engagement quality, diversity of discovery, and sustained user trust. Equally important are stability, responsiveness, and low cognitive friction—factors that determine how natural recommendations feel within daily use. When personalization deepens satisfaction and discovery rather than mere consumption, it achieves its fullest expression.
The Yambda dataset addresses persistent limitations in recommender-system research. Yambda introduces a large-scale, temporally structured dataset of nearly five billion interactions, providing real-world complexity absent in traditional academic benchmarks. The dataset includes temporal splits, organic versus recommended event flags, and contextual metadata—key for evaluating algorithmic generalization. Complementing this, the LogQ method refines sampled-softmax training to reduce bias and improve ranking accuracy in large retrieval models. Together, these initiatives narrow the divide between academic experimentation and industrial-scale deployment, enabling more reproducible, transparent, and transferable research across the global recommendation community.
We foster an environment where our teams can work on frontier research problems that have immediate, large-scale impact. Engineers and researchers get to explore cutting-edge challenges—like scaling transformers to a billion parameters or developing new training paradigms—and then see their work deployed to millions of users. This direct line from theoretical innovation to tangible product impact is a powerful catalyst for professional growth. We also contribute to the community by publishing our findings and releasing datasets, which fosters external collaboration and learning.
Currently, we are working on an end-to-end generative recommendation model. This model is not just about improving the existing stages of candidate generation and ranking — it aims to replace them entirely with a single neural network module.
This shift presents us with a major challenge on two fronts: not only in cutting-edge research, but also in the transformation of the underlying infrastructure that powers all our systems.
I am excited by the shift from personalization based on past actions to systems that understand user context and intent. ARGUS, with its long-term context, is a step in this direction. The future lies in systems that can adapt to a user's current situation and mood, making digital experiences not just personalized, but truly contextual and dynamic. This will unlock new levels of relevance and discovery, making technology more adaptive and helpful for everyone.
Nikolai Savushkin is the Head of the Recommendation Systems team at Yandex, where he oversees the development and implementation of content and product personalization technologies for millions of users.
Under his leadership, the team created a unique generative neural network-based recommendation system. The implementation of this method significantly improved conversion rates and the accuracy of personalized recommendations, increasing target user action metrics by up to 20%. The integration of these new models into the company's services marked the most successful single deployment in several years.
Yandex is a global technology company that builds intelligent products and services powered by machine learning. Its goal is to help consumers and businesses better navigate the online and offline world. Since 1997, Yandex has delivered world-class, locally relevant search and information services and developed market-leading on-demand transportation services, navigation products, and other mobile applications for millions of consumers worldwide.
Learn more at yandex.com/company