TwelveLabs, a video intelligence company focused on understanding and reasoning over video data at scale, has raised $100 million in Series B funding. The round was co-led by NEA and NAVER Ventures, with participation from Amazon, Index Ventures, Radical Ventures, Korea Investment Partners, Quadrille Capital, and Red Bull Ventures. The investment will accelerate TwelveLabs’ expansion from video understanding models into a full-stack agentic intelligence system that unifies perception, knowledge, and reasoning for video data.
TwelveLabs is positioning itself at the center of a major shift in AI: moving from simple video understanding models to full-stack agentic systems capable of reasoning across vast video datasets.
While video accounts for the majority of global data, much of it remains unstructured and difficult to analyze. Traditional AI systems built for text struggle to interpret video holistically, often missing temporal context and complex visual relationships.
TwelveLabs addresses this gap by building systems that convert video into a structured semantic layer, enabling it to be searched, analyzed, and acted upon like a living intelligence system rather than static storage.
The company’s platform is built on proprietary multimodal models designed specifically for video.
Marengo 3.0 serves as a high-performance video embedding model that understands motion, audio, speech, and visual elements across time. It transforms raw video into searchable semantic representations.
Alongside it, Pegasus 1.5 structures video into scene-level and entity-based representations, enabling reasoning over temporal events, objects, and contextual relationships.
Together, these models form the perception layer of TwelveLabs’ system and are available through both Amazon Bedrock and TwelveLabs’ API.
TwelveLabs is moving beyond video search toward an agentic architecture where intelligence improves as more data is processed.
Instead of treating video as static content or breaking it into isolated frames, the system builds persistent memory across all ingested video. This enables continuous reasoning over time, where each new dataset enhances system-wide understanding.
The company describes this as a shift from query-based retrieval to compounding intelligence, where performance improves as more video is analyzed.
TwelveLabs has already gained traction in sectors where video analysis is mission-critical, including:
Organizations are using the platform to unlock value from large video archives that were previously too complex or expensive to analyze at scale.
TwelveLabs’ relationship with Amazon Web Services extends beyond investment. AWS serves as its preferred cloud provider, and the companies are collaborating on infrastructure optimization, including the use of AWS Trainium chips for video inference workloads.
TwelveLabs models are also available via Amazon Bedrock, enabling enterprise customers to integrate video intelligence capabilities into production systems.
"TwelveLabs has been pushing the boundaries of what AI can perceive and reason about since its earliest days, and we've had the privilege of partnering with them throughout that journey," said Jason Bennett, VP and Global Head of Startups and Venture Capital at AWS. "Their models have been delivering real value to customers on Amazon Bedrock for more than a year, and as they scale their video cognition system on AWS infrastructure—including our purpose-built Trainium chips—we're excited to deepen our partnership with a team that is defining video intelligence at production scale."
With its foundation models and agentic infrastructure in place, TwelveLabs is now expanding into application-layer products. Its first product, Rodeo, marks a step toward delivering end-to-end video intelligence solutions directly to users without requiring complex integration.
The company’s goal is to make video intelligence accessible to creators, enterprises, and developers through a unified platform that transforms video from a storage burden into a strategic asset.
TwelveLabs CEO and co-founder Jae Lee described the company’s long-term vision as building a system where video becomes the primary substrate of machine intelligence.
The Series B funding will support expanded research and development, global hiring, and new offices in New York and London, alongside continued operations in San Francisco and Seoul.
TwelveLabs is a video intelligence platform that enables machines to perceive, understand, and reason about video. Its architecture unifies perception, knowledge, and reasoning into a single system that compounds in value over time. The company serves industries including media, entertainment, advertising, government, security, and automotive, and is headquartered in San Francisco with global offices in Seoul, New York, Los Angeles, and London.