AI Tools for Audio – An Overview of the Latest Applications for Sound Postproduction

AI Tools for Audio – An Overview of the Latest Applications for Sound Postproduction

We talk a lot about incredible image generators, the sentient powers of ChatGPT, and how artificial intelligence already influences the video branch. The smallest invisible features based on neural networks catch our attention even in post-production software. Yet, the area of impressive soundscapes somehow remains slightly out of focus. But believe me, technological advancement has certainly not lagged behind. Let’s take a look at different AI tools for audio and see how far they have come.

There’s no need to debate about how artificial intelligence has permeated every facet of our lives. Sometimes so fast, it seems alarming. Google’s AI can now identify the music you listen to based on your brain signals. Sound like fake news? Then please head over here and read the official research paper. Personally, I get goosebumps after reading the first couple of sentences.

Although at times unnerving, the development of AI technology brings with it useful tools, which can help to enhance and speed up our work. In this article, by “us” I refer to indie filmmakers, who make their own sound posts, and also specifically to audio engineers.

Text to Speech or AI voice generators

How often do you need a decent voice-over in your video projects? I imagine quite a lot. While, in my opinion, a machine can never replace a human tone and our manner of speaking, in some cases AI’s performance can be enough – for example, if you only need it for a previz, rough cut, or a story where an artificial voice is somehow appropriate.

AI voice generators aren’t big news in a world where Siri has run the show for more than a decade, but some of the latest ones are really impressive. Let’s take LOVO as an example. Their text-to-speech tool called Genny can express up to 25+ emotions. I asked it to read a poem using a young female voice, and then repeated the request but applied the emotion “tired”. The results were impressive and extremely realistic.

AI tools for audio - LOVO and Genny voice generator
Genny’s visual appearance. Image source: a screenshot of LOVO’s interface

What I noticed during this test, though, is that only some of the speakers in Genny’s library deliver “emotional” voice-overs. So, either you have to stick to the standard narrative speech or restrict your choice to the more emotional voice presenters.

Also, LOVO is not free of charge, but they have different pricing plans and a free 2-week trial (Genny allows you to generate 20 minutes of speech). But, there are also dozens of other AI voice generators on the market, like Speechify (where you can type in your text in advance to hear how it will sound read by a chosen presenter);, which offers new users 10 minutes of generated voice-over for free; or Resemble, capable of converting the voice into different languages without providing additional data.

AI tools for audio that find the best possible music

Artificial intelligence might also support you in finding the best music for your project. If you’ve ever spent hours digging through stock libraries looking for the right track, you know the struggle is real. That’s why several platforms introduced the so-called AI-powered search.

AI tools for audio - Uppbeat AI playlist generator
Image source: Uppbeat

For example, not long ago, the British free music platform Uppbeat launched a new feature – AI-generated playlists, based on the text inputs that users provide. It works quite simply: You describe a scene from your video, or what the music should sound like, and in mere seconds, the platform offers you various suitable tracks from its library. As the developers say, their system uses the large language model ChatGPT, which is incorporated into the search.

You can read more about how to work with this feature in your video projects.

Creating entire music tracks with the help of AI

When stock music becomes unbearable (which I suppose happens to all of us occasionally), neural networks can create something different for you. There are two big AI music generators at the moment (alongside hundreds of smaller ones), competing for users. The first is MusicLM from Google, and the second, MusicGen from Meta.

Both describe their software as experimental AI tools, both allow us to generate melodies from text descriptions, and both are still in the beta phase. However, while Google lets people join their AI Test Kitchen (you can sign up and wait for an invite here) to try out the new generative software, Meta’s project is completely open-sourced. We wrote about it in detail here.

So, how do music generators work? You feed their machine learning models with any text description (or/and a reference track) and get back a melody. For example, you can ask the AI for “a calming violin melody backed by a distorted guitar riff”, or for “a dark-metal twisted version of the Friends intro”. According to Google, MusicLM generates music at 24kHz, which remains consistent over several minutes. MusicGen, on the contrary, restricts the output tracks to 15 seconds. You can try out the latter right now on their Hugging Face space. Please, tell us about your experience. Our results were quite chunky and not really ready to be used in an actual project, but neural networks learn fast. So, possibly, in the upcoming year, AI-generated music might have a shot.

Sound effects with AI for audio

After the release of MusicGen, Meta also announced similar AI-powered software for sound effects. It is called AudioGen and works according to the same principle. Describe what sounds you are looking for and let the neural network do its magic.

Developers trained AudioGen on public sound effects, and when you give it a textual description of an acoustic scene, it generates 5 seconds of audio that matches your prompt. As it’s also an open-sourced project, you can try out the model on Hugging Face or download, adjust, and train it further here.

AI tools for audio - AudioGen by Meta
Testing space of AudioGen on Hugging Face. Image source: a screenshot from Hugging Face

My personal first experiences with AudioGen have been troublesome so far. While the model perfectly understands the wording and tries its best to find matching sounds, the overall track composition doesn’t feel consistent and realistic. Yet, it’s an amazing development, and I guess it won’t take long until AI offers a decent alternative to sound libraries.

As you probably remember, Adobe also announced a similar SFX generative function in their upcoming “Firefly for video” project. We’ll witness its capabilities.

Audio postproduction and increasing speech quality

Speaking of Adobe, last year the company worked hard on developing different applications using artificial intelligence, including AI tools for audio. For example, their AI Audio enhancer (part of Adobe Podcast) can take a low-quality voice recording and make it sound as if it was captured in a professional studio. Head over here, if you want to try it out.

AI tools for audio - Adobe's AI speech enhancer
Image source: Adobe

Audio enhancer removes all disturbing background noise, adjusts the sound to refine the frequencies, and gives the recording an overall professional quality. This is a great speech enhancer, especially if you recorded an interview in a busy place, only had a smartphone on hand for a statement, or you want to save an improperly leveled audio file. However, it works on voice only, so it can’t help you with enhancing, say, the music quality.

If you don’t have an Adobe subscription, there are other similar AI tools for this task. AI|Coustics, for example, is free to use, and supports voice files in .mp3, .wav, and .m4a, up to 30 MB, for a maximum of 10 minutes in length.

Separating voice and music tracks with AI tools for audio

The last useful audio tool I want to mention in this overview is Their AI, called Cassiopeia, allows users to separate voice from the soundtrack. According to the developers, the neural network uses a technology called stem separation to distinguish vocals from music. That way, it can even break down the background melody into different instruments, which lets you isolate and edit any part of the recording.

AI tools for audio - separating tracks with successfully separated all the tracks in my uploaded file. Image source: a screenshot of their browser interface

Why would you need such a tool? Several reasons. Maybe you have archive footage and only want a voice-over part from it. Another user case could be parody videos on YouTube that need particular audio tracks from their favorite films or series. Creating simple karaoke backing plates is also a good example of what offers.

You can try out without a subscription plan, but only on 10 minutes of recordings. After that, the platform charges based on the length of the audio you wish to extract.

If you need a completely zero-cost tool, then head over to Vocal Remover. This application is less powerful than its competitor (and can only separate voice from music using AI), but it does the job, so why not?

The list can go on and on

Although we already mentioned at least 10 different AI tools for audio in this article, it still feels like we are only scratching the surface. There is so much exciting research in this area, and new applications pop up every day. Have you heard of Muzify, which creates AI-generated Spotify playlists for your favorite books and novels? How about Voicify – an AI, that lets its user create music covers with their favorite artists like Taylor Swift? And…

Okay, we’ll stop here for now and turn the tables. Do you also use AI tools for audio? If so, which ones are your favorites and should definitely be on this list? What is your opinion on AI-generated music and sound effects? Let’s talk in the comment section below!

Feature image source: created with Midjourney for CineD.

Leave a reply

Sort by:
Sort by:

Take part in the CineD community experience