Deepgram, the world’s most realistic and real-time Voice AI platform, unveiled Flux at VapiCon 2025, introducing the first conversational speech recognition (CSR) model optimized for real-time voice agents, enabling natural dialogue flow by understanding speaker timing and context beyond mere transcription.
Traditional automatic speech recognition (ASR) systems, designed for transcription tasks like captions or meeting notes, fall short in live dialogues, forcing developers to integrate separate voice activity detection and turn-taking logic that introduces latency and errors. Flux addresses this by modeling dialogue flow natively, capturing not just words but speaker completion cues and response timing to create engaging, human-like interactions. This shift empowers enterprises adopting automated customer self-service, agent assist tools, and embedded conversational experiences across sectors, aligning with the booming voice AI agents market projected to grow from $2.4 billion in 2024 to $47.5 billion by 2034 at a 34.8% CAGR.
Flux delivers embedded turn-taking intelligence for context-aware detection and barge-in handling, ensuring seamless exchanges without awkward pauses. Its lightning-fast performance includes ~260ms end-of-turn detection and events for eager responses, reducing delays in dynamic conversations. Developers benefit from simpler workflows, as turn-complete transcripts and structured cues eliminate complex client-side coding, accelerating production-ready agents from months to weeks. For enterprise scalability, Flux matches Nova-3 accuracy, handles 100+ concurrent streams per GPU, and offers cost predictability, avoiding the inefficiencies of add-on systems.
The launch at VapiCon highlights Flux's alignment with developer platforms focused on conversational interfaces. “At Vapi, our mission has always been to give engineering teams a platform to build their conversational front-door,” said Jordan Dearsley, Founder, CEO, Vapi. “Deepgram’s launch of Flux is a perfect example of that vision coming to life. By embedding turn-taking directly into recognition, Flux solves one of the hardest challenges in conversational AI. We’re thrilled Deepgram chose VapiCon to introduce this breakthrough, and we can’t wait to see the incredible voice agents developers create with it.”
“Flux redefines what speech recognition can do for real-time AI,” said Scott Stephenson, CEO and Co-Founder, Deepgram. “For decades, ASR was built to listen and record. Flux is different — it listens, understands, and guides conversations with human-like timing. It’s the foundation voice agents have been waiting for and is our latest milestone towards solving the Audio Turing Test.”
"At Lindy, our mission is to build the world's most capable AI employees, and voice is a big part of this," said Flo Crivello, Founder and CEO, Lindy. "Deepgram has been our partner of choice since the earliest days, and Flux brings things to the next level: there is simply nothing coming close on the market in terms of latency or conversation awareness. It's enabled us to deliver the smoothest, most natural, interruption-free conversations for our customers."
Flux targets voice AI developers, engineering leads, and AI teams building real-time agents, as well as enterprise leaders enhancing customer experiences through agent assist and conversational platforms. Ecosystem partners, including platform providers and cloud architects, can integrate CSR into broader AI stacks for scalable solutions. Now generally available, Flux is accessible via cloud APIs or self-hosted options, building on Deepgram's processing of over 50,000 years of audio and 1 trillion transcribed words.
To mark the launch, Deepgram's OktoberFLUX promotion offers free access throughout October, supporting up to 50 concurrent connections for experimentation. Developers can explore at https://deepgram.com/flux to prototype responsive voice agents and unlock new possibilities in conversational AI.
Deepgram's Flux positions the platform as a leader in voice-native models, driving the enterprise shift toward immersive, efficient AI interactions that elevate user engagement and operational intelligence.
Deepgram is the world’s most realistic and real-time Voice AI platform, offering speech-to-text (STT), text-to-speech (TTS), and full speech-to-speech (STS) capabilities–all powered by our enterprise-grade runtime. 200,000+ developers build with Deepgram’s voice-native foundational models – accessed through cloud APIs or as self-hosted / on-premises APIs – due to our unmatched accuracy, low latency, and pricing. Customers include technology ISVs building voice products or platforms, co-sell partners working with large enterprises, and enterprises solving internal use cases. Having processed over 50,000 years of audio and transcribed over 1 trillion words, there is no organization in the world that understands voice better than Deepgram.