Daily Research News Online no. 40026 - $100m Funding for Video Intelligence Firm TwelveLabs

$100m Funding for Video Intelligence Firm TwelveLabs

July 2 2026

In San Francisco, 'human-like' video analysis specialist TwelveLabs has announced the raising of $100 million in Series B funding.

TwelveLabs logo The company aims to build software that can see, listen and understand the world in the same way humans do: video content including audio and text elements is analysed to identify actions and objects, classify scenes and extract topics - users can also describe a moment in words and be taken straight to the relevant scene. Following earlier rounds of investment, TwelveLabs is now looking to expand beyond video understanding models into a full-stack agentic intelligence system for video, 'combining perception, knowledge, and reasoning into a single architecture.' This will enable the analysis of 'vast video archives' and the unlocking of footage that was 'historically too hard to analyze, operationalize, or monetize.'

Recent releases include the Marengo 3.0 model, which the firm says understands every sound, word and motion on screen, turning raw video into a semantic layer that machines and AI can understand and search at scale; Pegasus 1.5 model, which turns video into structured data including scene boundaries, entities, temporal segments and semantic context; and Rodeo, the first of a number of planned applications which put the full system in the hands of creators, operators and decision-makers, without the need for integration.

The company says it has 'deep traction' in media and entertainment and is moving into the public sector as well. Sectors including advertising, security, sports and automotive continue to drive demand for the platform.

The latest round was co-led by NEA and NAVER Ventures with participation from Amazon, Radical Ventures, Korea Investment Partners, Index Ventures, Quadrille Capital, and Red Bull Ventures.

'Five years ago, we made a contrarian bet,' says Jae Lee, CEO and co-founder. 'The substrate of machine intelligence is recorded reality in motion, not language. Language is downstream of understanding. Video is the data understanding has to answer to. We have spent half a decade building the perception, knowledge, and reasoning architecture to close that gap. Models commoditize. The intelligence layer that composes them does not. This funding lets us take TwelveLabs from foundation models to a full-stack video cognition system that meets every user, every agent, and every machine that needs to understand the world. The road to Video Superintelligence starts here..

With operations also in Seoul, New York, Los Angeles and London, the company is online at www.twelvelabs.io .