Google is working on generative AI soundtracks and dialogue for videos

So, AI can do film composing now.
By Cecily Mauran  on 
Hands of a conductor of a symphony orchestra close-up in black and white
Google DeepMind's model can generate audio for AI videos and existing footage. Credit: Furtseff / iStock / Getty Images Plus

Everyone knows sound is a critical component to most films and videos. After all, even when films were silent, there was still a musical accompanist letting the audience know how to feel.

This natural law remains the same for the new crop of generative AI videos, which emerge eerily silent. That's part of why Google has been working on "video-to-audio" technology (V2A) which "makes synchronized audiovisual generation possible." On Monday, Google's AI lab, DeepMind, shared progress on generating such audio including soundtracks and dialogue that automatically match up with AI-generated videos.

Google has been hard at work developing multimodal generative AI technology to compete with rivals. OpenAI has its AI video generator Sora (yet to be publicly released) and GPT-4o, which creates AI voice responses. Companies like Meta and Suno have been exploring AI-generated audio and music, but pairing audio with video is relatively new. ElevenLabs has a similar tool that matches audio to text prompts, but DeepMind says V2A is different because it doesn't require text prompts.

Mashable Light Speed
Want more out-of-this world tech, space and science stories?
Sign up for Mashable's weekly Light Speed newsletter.
By signing up you agree to our Terms of Use and Privacy Policy.
Thanks for signing up!

V2A can be paired with AI video tools like Google Veo or existing archival footage and silent films. This can be used for soundtracks, sound effects, and even dialogue. It works by using a diffusion model trained with visual inputs, natural language prompts, and video annotations to gradually refine random noise into audio that fits the tone and context of videos.

Google DeepMind says V2A can "understand raw pixels" therefore you don't actually need a text prompt to generate the audio, but it does help with the accuracy. The model can also be prompted to make the tone of the audio sound positive or negative. Along with the announcement, DeepMind released some demo videos, including a video of a dark, creepy hallway accompanied by horror music, a lone cowboy at sunset scored to a mellow harmonica tune, and an animated figure talking about its dinner.

V2A will include Google's SynthID watermarking as a safeguarding measure against misuse, and Deepmind's blog post says the feature is currently undergoing testing before it's released to the public.

Mashable Image
Cecily Mauran

Cecily is a tech reporter at Mashable who covers AI, Apple, and emerging tech trends. Before getting her master's degree at Columbia Journalism School, she spent several years working with startups and social impact businesses for Unreasonable Group and B Lab. Before that, she co-founded a startup consulting business for emerging entrepreneurial hubs in South America, Europe, and Asia. You can find her on Twitter at @cecily_mauran.


Recommended For You
The best workout playlist demands these soundtracks and scores
Composite of a woman working out, surrounded by movie and tv show characters.

Microsoft risks billions in fines as EU investigates its generative AI disclosures
The OpenAI and Microsoft logos projected on a shiny black wall.

What to know about Adobe Lightroom's new AI feature
Side view of Asian female freelancer photographer cheking photos on a digital camera while sitting at the table in workstation - stock photo

What you need to know about Adobe Lightroom's new AI feature
modern designer

What you should know about Adobe Lightroom's new AI feature
Side view of Asian female freelancer photographer cheking photos on a digital camera while sitting at the table in workstation - stock photo

Trending on Mashable
Wordle today: Here's the answer hints for July 31
a phone displaying Wordle

NYT Connections today: See hints and answers for July 31
A phone displaying the New York Times game 'Connections.'

Webb telescope snapped photo of huge world — in a distant solar system
An illustration of the James Webb Space Telescope as it orbits the sun in our solar system, 1 million miles from Earth.

All the best places to click on when you want to get off
pornhub Logo

NYT Connections today: See hints and answers for July 30
A phone displaying the New York Times game 'Connections.'
The biggest stories of the day delivered to your inbox.
This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.
Thanks for signing up. See you at your inbox!