Google DeepMind unveils a new video model to rival Sora
Google DeepMind, Google’s leading AI research lab, wants to beat OpenAI at the video creation game — and it might happen, at least for a little while.
DeepMind on Monday announced Veo 2, the next generation AI for video creation and the successor to Viewwhich forces a increase number of products via Google Wallet. The Veo 2 can create clips longer than two minutes at up to 4K (4096 x 2160 pixels) resolution.
Notably, this is 4 times the accuracy – and more than 6 times the duration – of OpenAI Sora It can be achieved.
It’s a theoretical feature at the moment. In Google’s experimental video creation tool, VideoFX, where the Veo 2 is now available exclusively, videos are capped at 720p and eight seconds long. (Sora can produce up to 1080p 20-second clips.)

VideoFX is behind a waiting list, but Google says it’s working to expand the number of users who can access it this week.
Eli Collins, vice president of product at DeepMind, also told TechCrunch that Google will make Veo 2 available through its website. Vertex Artificial Intelligence The developer platform “is where the model becomes ready for widespread use.”
“Over the coming months, we will continue to iterate based on user feedback,” Collins said. [we’ll] We look forward to integrating the updated Veo 2 capabilities into compelling use cases across the Google ecosystem… [W]I expect to share more updates next year.
More controllable
Like the Veo, the Veo 2 can create videos with a text prompt (for example, “A car is racing on a highway”) or text and a reference image.
So what’s new in Veo 2? Well, DeepMind says the model, which can create clips in a range of styles, has an improved “understanding” of physics and camera controls, and produces “sharper” footage.
By sharpening, DeepMind means that textures and images in clips are sharper – especially in scenes with a lot of action. As for improved camera controls, they enable the Veo 2 to more accurately place the virtual “camera” in the videos it generates, and move that camera to capture objects and people from different angles.
DeepMind also claims that Veo 2 can more realistically model motion, fluid dynamics (such as pouring coffee into a cup), and light properties (such as shadows and reflections). This includes different lenses and cinematic effects, as well as “accurate” human expression, DeepMind says.

DeepMind shared some select samples of Veo 2 with TechCrunch last week. For AI-generated videos, they looked pretty good, even exceptionally good. Veo 2 seems to have a keen understanding of refraction and difficult liquids, like maple syrup, and a knack for Pixar-style animation simulations.
But despite DeepMind’s insistence that this model is less likely hallucination Items like extra fingers or “unexpected objects”, the Veo 2 can’t quite clear the uncanny valley.
Notice the lifeless eyes of this cartoon dog-like creature:

And the strangely slippery road in these shots – as well as the pedestrians in the background blending into each other and the buildings with physically impossible facades:

Collins acknowledged there is work to be done.
“Cohesion and consistency are areas for growth,” he said. “Veo can always stick to the claim for a few minutes, however [it can’t] Commitment to complex claims over long horizons. Likewise, consistency of personality can be a challenge. There is also room for improvement in generating intricate details, fast and complex movements, and continuing to push the boundaries of realism.
Collins added that DeepMind continues to work with artists and producers to improve video production models and tools.
“We began working with creators like Donald Glover, The Weeknd, d4vd, and others from the beginning of Veo’s development to truly understand their creative process and how technology can help realize their vision,” Collins said. “Our work with creators on Veo 1 has led to the development of Veo 2, and we look forward to working with trusted testers and creators to get feedback on this new model.”
Safety and training
Veo 2 trained on a lot of videos. This is the general way that AI models work: Provided with example after example of some form of data, the models pick up patterns in the data that allow them to generate new data.
DeepMind won’t say exactly where it transcribed the Veo 2 training videos, but YouTube is one possible source; Google owns YouTube and DeepMind previously She told TechCrunch that Google models like Veo “may” be trained on some YouTube content.
“Veo is trained in high-quality video description pairings,” Collins said. “Video description pairs are a video and an associated description of what is happening in that video.”

While DeepMind, through Google, hosts tools to allow webmasters to prevent lab bots from scraping training data from their websites, DeepMind does not offer a mechanism to allow creators to remove works from its existing training sets. The laboratory and its parent company confirm that training models that use public data are… Fair useWhich means that DeepMind believes it is not obligated to ask permission from data owners.
Not all creative people agree – especially in light studies It is estimated that tens of thousands of film and television jobs could be disrupted by artificial intelligence in the coming years. Several AI companies, including the eponymous startup behind the popular AI app Midjourney, are on the list. goal to Lawsuits They were accused of violating the rights of artists by training on the content without their consent.
“We are committed to working collaboratively with creators and our partners to achieve common goals,” Collins said. “We continue to work with the creative community and people across the broader industry, gathering ideas and listening to feedback, including those who use VideoFX.”
Thanks to the way today’s generative models behave when trained, they carry certain risks, such as regurgitation, which refers to when the model creates an exact copy of the training data. DeepMind’s solution is fast-level filters, including violent, graphic and explicit content.
Google Compensation policyCollins, which provides a defense for some customers against claims of copyright infringement stemming from the use of its products, will not apply to the Veo 2 until it becomes generally available, Collins said.

To mitigate the risks of deepfakes, DeepMind says it uses its own watermarking technology, SynthID, to embed invisible marks in the frames that the Veo 2 creates. However, like all watermarking technologies, SynthID Not guaranteed.
Image upgrades
In addition to Veo 2, Google DeepMind this morning announced upgrades to Picture 3Business image generation model.
A new version of Imagen 3 is rolling out to users of ImageFX, Google’s image creation tool, starting today. It can create “brighter, better-composed” photos and images in styles like photorealism, impressionism, and anime, according to DeepMind.
“This promotion [to Imagen 3] “It also follows prompts more faithfully, displaying richer details and textures,” DeepMind wrote in a blog post provided to TechCrunch.

UI updates for ImageFX are rolling out alongside the template. Now, when users type in prompts, key terms in those prompts will become “segments” with a drop-down list of related suggested words. Users can use chips to repeat what they’ve typed, or choose from a row of automatically generated descriptions below the prompt.