Home
News
Tech Grid
Interviews
Anecdotes
Think Stack
Press Releases
Articles
  • Enterprise AI

Tavus Launches Raven-1 for Multimodal Perception in Conversational AI


Tavus Launches Raven-1 for Multimodal Perception in Conversational AI
  • by: Business Wire
  • |
  • February 12, 2026

Tavus has launched Raven-1, a multimodal perception system that enables real-time conversational AI to understand emotion, intent, and context by fusing audio and visual signals. Unlike traditional systems that flatten speech into text and emotion into rigid categories, Raven-1 produces natural language descriptions of tone, expression, and posture at sentence-level granularity—allowing AI to perceive not just what users say, but how they say it. The model is now generally available across Tavus conversations and APIs.

Quick Intel

  • Tavus launches Raven-1, a multimodal perception system for conversational AI.

  • It fuses audio (tone, pacing, prosody) and visual (expression, posture, gaze) signals in real time.

  • Outputs interpretable natural language descriptions of emotional state and intent.

  • Audio perception latency is sub-100ms; combined pipeline latency under 600ms.

  • Supports custom tool calling for developer-defined events like emotional thresholds.

  • Available immediately across all Tavus APIs and Conversational Video Interface (CVI).

The Missing Layer in Conversational AI: Perception

Conversational AI has made significant strides in language generation and speech synthesis, yet understanding remains a critical gap. Most systems rely on speech-to-text transcription, stripping away tone, hesitation, sarcasm, and emotional nuance. Without perception of how something is said, AI is forced to guess at intent—and those guesses fail precisely when accuracy matters most. Raven-1 addresses this by treating audio and visual signals as a unified whole, not separate data streams.

From Categorical Labels to Natural Language Understanding

Traditional emotion detection systems flatten human expression into rigid labels like "happy" or "sad." Raven-1 takes a fundamentally different approach: it generates rich, sentence-level natural language descriptions of emotional state and attentional shifts. These outputs are directly aligned with LLMs, requiring no translation layer. This enables AI to reason over nuanced, layered, or even contradictory emotional signals—such as frustration mixed with hope—that categorical systems cannot capture.

Built for Real-Time, High-Stakes Interactions

Raven-1 was architected from the ground up for real-time operation. Audio perception completes in under 100 milliseconds, with the full multimodal pipeline maintaining context under 600ms. This makes it suitable for high-stakes applications like healthcare, therapy, coaching, and interviews, where up to 75% of diagnostic signal comes from patient communication rather than tests. The system excels on short, ambiguous inputs—a single "fine" or "sure" carries radically different meaning depending on delivery, and Raven-1 captures that difference.

Closing the Loop with Conversational Timing and Generation

Raven-1 functions as a perception layer that works in concert with Tavus's Sparrow-1 (conversational timing) and Phoenix-4 models, creating a closed loop where perception informs response and response reshapes the moment. This enables AI that doesn't just generate fluent language, but understands when to speak, when to listen, and how to adapt in real time to the human on the other side of the conversation.

"Raven-1 captures and interprets audio and visual signals together, enabling AI systems to understand not just what users say, but how they say it and what that combination actually means." The model is now generally available, bringing human-like perception one step closer to reality.

About Tavus

Tavus is a San Francisco-based AI research company pioneering human computing, the next era of computing built around adaptive and emotionally intelligent AI humans. Tavus develops foundational models that enable machines to see, hear, respond, and act in ways that feel natural to people.

  • Conversational AIReal Time AIHuman Computing
News Disclaimer
  • Share