AI Music Generators in Action – the Breakneck Development of Automated Soundtracks

A couple of weeks ago, we talked about creative approaches to film music and sound design. Then, my colleague Nino Leitner came to work with a fully AI-generated CineD song in his hands that didn’t at all resemble abstract mechanical noise but rather a real music track. I wonder, how far have these tools advanced over the previous year? What is the current state of development in this area? Can AI music generators already compose a film score? Let’s test a bunch of them and find out!

It’s been a while since I wrote about MusicLM from Google and MusicGen from Meta – the two biggest applications for AI music generation at the time. Testers of these tools were not happy back then. “Sounds horrible,” “melodies are random,” and “chord progressions don’t make any sense” – these are just a few comments I can remember.

However, AI training doesn’t stop. Roughly a year later, generated tracks are worlds apart from the results we got before. For one, artificial intelligence learned how to “sing.”

AI music generators: a huge leap

Before we unwrap new and popular tools, let’s take a look at already familiar ones. For instance, Google’s MusicLM model doesn’t exist anymore. Instead, developers listened to all the feedback so they were able to integrate and launch MusicFX. They still call it a “generative AI text-to-music experiment,” as it is their area of AI research and a tool in Beta phase.

MusicFX can produce tunes 30, 50, or 70 seconds in length. You only need to feed it a text description, and after analyzing your wording, the neural network will offer two variants. Personally, I was amazed by the quality of the resulting tracks (especially bearing in mind how horribly MusicLM’s attempts sounded just a year ago). Here’s an example:

What do you imagine when you hear the melody above? A wheat field lit by the tender rays of sunset? Maybe a sad-looking woman in a white dress, touching the grain as she wanders around? A slow melancholic scene from a period drama by Terrence Malick? These images occur in my head. However, my text prompt to MusicFX included nothing of the sort. On the contrary, it went: “A film score instrumental piece for a dark fantasy indie movie, featuring a fight scene between a witch and her hunter.”

So, yes, Google’s AI music generator didn’t provide anything close to my original request. At the same time, the created melody does sound consistent (at least, to my rookie ear) and ignites an emotional response. After some tests, it also became clear that this AI can generate better results when you use style terms (like “jazz”), include required instruments into your prompt (“drums,” “guitar,” “strings”), or rhythm (fast, slow, middle-paced).

You can try MusicFX for free here, and please let me know if you share my excitement (although, I’d better call it “afraidment,” as such AI advances always have a bitter aftertaste in the beginning).

Generated lyrics and vocals? No problem!

The rapid AI music development has gone even further, though. Here’s the mentioned CineD song that Nino brought to the office in its entirety:

This track was created by Udio, based solely on the text prompt: “A song about CineD, a filmmaking technology platform, pop song, indie.” No other settings, commands, or alterations needed. The neural network developed lyrics and included AI-generated vocals. It is indeed a radically simplified approach to music composing.

Udio’s team consists of ex-Google DeepMind researchers, so it doesn’t surprise me that they have enough expertise in AI training. Their tool gets frequent updates. For example, earlier in May, developers introduced the so-called “Inpainting” feature. It allows users to select a portion of a track to re-generate based on the surrounding context. This improvement should help to edit single vocal lines, correct errors, or smooth transitions. However, inpainting is currently available only for subscribers.

Another attempt at an AI-generated film score

Udio also has a free plan that grants users 10 credits/day with an additional 100 extra credits/month. (1 generation = 1 credit). That’s why I decided to try my luck in creating a dark fantasy score again, using the same prompt as with MusicFX earlier. Like Google’s AI, Udio generates simultaneously two various tracks, 32 seconds each. Here’s my favorite:

The deep-learning model gave my melody a title – “Witch Requiem” – and labeled it with mood specifications. In my case, the track was marked as “atmospheric” and “suspenseful,” although I didn’t use these words in the text input. What do you think? Much closer to the original idea, isn’t it? I wouldn’t use it for my “dirty fight with magic elements” scene though. The created score is too slow and epic for my taste.

If 32 seconds are not enough, you can always click on the “Extend” button, which offers users some level of control. For example, the app will let you decide when the extension should take place and whether you want to specify the text description for the new part.

A screenshot of Udio’s browser interface and “Extend” settings. Image source: Mascha Deikova / CineD

Comparing popular AI music generators

For the sake of the experiment, I had to try out another AI music generator, which is quite popular nowadays and regularly pops up in my social media feed. It is called Suno, and its founding team also came from big tech companies (Meta, TikTok) before starting their own business.

Right off the bat, Suno has a very similar interface and workflow. It offers an almost identical basic free plan (50 credits that renew daily, which equals 10 songs) and also runs two jobs at once. On the other hand, the generated tracks are much longer (two minutes each), and overall, the tool works considerably faster than Udio.

In terms of quality and language understanding, I want you to be the judges with me. Here’s my preferred dark fantasy score for the fight between a witch and her hunter, generated by Suno:

Suno titled this track “Midnight Duel” and added a cover to it (possibly also AI-generated). I feel that the rhythm and orchestral work hit a bit closer to home, but still, this song reminds me more of a generic computer game than a cinematic experience. What would you say?

Of course, I also couldn’t resist making yet another CineD melody using Nino’s original prompt. What’s useful is that this AI model publishes the created lyrics directly on the same page. (By the way, both of these tools allow you to upload your own lyrics before the track generation).

A screenshot of Suno’s interface. Image source: Mascha Deikova/CineD

If only it could get the pronunciation of “CineD” correct, I would give it a better mark than Udio! (Just kidding! I could have written “CineDee” in the prompt, as Nino did, so I shouldn’t blame AI). Somehow, I like Suno’s sound quality better.

Pros, cons, limitations

On the bright side, using AI music generators is fun, especially if you have always secretly dreamed of becoming a musician but never had the time or financial means for proper training. Additionally, it’s a fast way to have something melodic under your footage. (A previz, or the first rough cut, could be perfect areas of application for these tools). I also can imagine that in the long run, AI music generators could replace the tedious search process on stock music platforms. But first, AI developers would have to figure out the ethical dilemma we always talk about. (What footage were their neural networks trained on? Do those original musicians and composers get attribution or residuals? Can AI teams really provide commercial rights to their users?) As an example, Udio allows sharing the created content on social media, as long as you properly indicate that it was AI-generated and what tool you used.

Other negatives and limitations:

  • The discussed tools do not give us enough creative control over the output. For instance, I can’t ask Suno to use a male voice instead of a female or to avoid violins at all costs. At least for now.
  • As we’ve seen above, the results do not always follow your initial request. Sometimes, the created melodies sound too generic. Other times, they might completely miss the required mood. It’s not like working with a professional composer who feels the story, has a personal style or sonic vision, and can implement all your wishes to the score.
  • Also, AI for sure won’t come up with an idea to incorporate the recordings of real power plant sounds into the score of “Chernobyl.”

Some alternative approaches

Naturally, it’s impossible to try out ALL the AI tools that are thriving in an already fully-packed market. I must mention that there are other approaches to generating melodies other than text-to-music models. Here’s a bunch of examples, if you want to try out something different:

  • Soundraw. This browser-based application doesn’t accept text prompts, but it offers you much more control over melodic parts and even allows users to upload the video preview (in case, you want to create a suitable soundtrack).
  • Boomy. In this AI music generator, you can choose particular instruments, rearrange specific sections, precisely change the tempo of your melody, and add your or AI voice. However, as hard as I tried, I couldn’t make it sound better than cacophony. Probably, I lack the talent in general, so please, give it another go for me and share your insights afterwards.
  • Loudly. This model is user-friendly and flexible, offering to tweak countless settings before it generates a song. Loudly generates no vocals or lyrics so far, but the created melody sounds more interesting than all my previous attempts.
Different parameters in the Loudly browser-based application. Image source: Mascha Deikova/CineD

The future of film music

I think the discussion in the comments is inevitable. When it comes to AI, we all wonder whether it will become the future of creative processes compared to how we think of them now. In my opinion (and hopefully, not only mine), human value is still what is scarce. If everyone can make it, then it won’t be that valuable. So, it’s not the AI that is going to replace us; it is creators with original ideas who know how to implement AI tools into their workflows.

And what do you think? Have you tried AI music generators? Are there any ones you particularly like that I didn’t mention? In what cases would you go for AI-generated music instead of buying human-composed pieces? Let’s talk in the comments below!

Feature image source: generated with Midjourney for CineD.


