Turn Words into Melodies – AI-Powered Music Generator MusicGen by Meta Introduced

Turn Words into Melodies – AI-Powered Music Generator MusicGen by Meta Introduced

AI here, AI there… nowadays, you can find at least one useful artificial intelligence tool at every stage of the video production process. They may enhance your creativity, take over mundane tasks, or speed up basic workflow. It’s always a surprise to see what modern technology is capable of. The new AI-powered music generator MusicGen by Meta was released to the public only a few weeks ago, but reviewers are already bouncing off the walls about its immense potential. With this tool, you can create high-quality, royalty-free music from a simple text description and use it directly in your project. How? Let’s find out.

Recently, we talked about improvements on stock footage platforms (like Uppbeat or Artlist), which now use artificial intelligence to help users find perfect clips for their projects. But imagine that you have a very specific music request and hiring a professional composer is unfortunately not in the budget. MusicGen by Meta might provide a quick solution in cases like this, and it already looks more promising than its biggest competitor MusicLM from Google.

MusicGen by Meta and its competitors

First of all, unlike Google, Meta decided to launch their music generation model as an open-source project, which seems like a great move among the community. Interested people can not only test it but also contribute to its development and create their own variations based on the initial neural network – that is if they possess the required technical skills and knowledge in machine learning. Don’t worry though, if you only want to create music, you don’t need any of that.

As you can see and hear in the introduction video above, posted by one of the MusicGen researcher engineers at Meta, Felix Kreuk, the new AI can use both a text prompt and a piece of music as a starting point for creating a melody. That’s also something new that previous competitive generative software lacked.

In addition to this, in the research paper, Meta compared clips produced by their music generator to examples created by Google’s MusicLM, Riffusion, and Moûsai. The results suggested that “MusicGen performs better than the evaluated baselines as evaluated by human listeners, both in terms of audio quality and adherence to the provided text description.“ This is even more impressive considering that MusicLM was trained on ten times the amount of data tracks compared to MusicGen!

How does this new generator work?

From the outside, it’s quite simple. (And let’s not dive into the complicated world of how machine learning functions; it’s a huge topic). You provide the neural network with a basic text description (something like “a cheerful country song with acoustic guitars“), feed it an additional reference track (if you want to), and click “generate”. After mere seconds, MusicGen comes up with 15 seconds of audio based on your text and musical cues. If you upload source music, the model will try to incorporate its broad melody into the resulting clip. So, getting something like a dark-metal twisted version of the “Friends” intro for your creative YouTube video is not a problem anymore.

MusicGen by Meta, hugging space interface for testing it out
My attempt at generating a music clip. Image source: the screenshot from Hugging Face space of MusicGen

For the record, your prompts can be much more specific. For example, MusicGen absolutely nails the given number of beats per minute (bpm), which may be significant for the creation of loops. In the realm of film soundtracks, even a mere 15 seconds of generated audio can work wonders, and we will delve deeper into the topic of loops shortly. At the same time, users that sign up for the project’s Hugging Face space, will be able to access clips of up to 120 seconds.

The ethical question

Playing around with the new music generator is fun, but let’s consider the ethical topic as well. The impact of generative AI on the creator’s community is no secret, sparking extensive discussions and debates. In most cases, developers use every available piece of footage to train their models and don’t care about rights and attribution. Meta went a different way. According to the company, MusicGen learned to compose using 10,000 hours of “high-quality” authorized songs and 390,000 instrumental tracks. (Footage was coming mostly from media libraries like Shutterstock and Pond5).  Furthermore, according to the researchers, this dataset is covered by legal agreements with the rights holders, and the overall project is licensed under the MIT license.

MusicGen by Meta - MIT license
License information. Image source: MusicGen by Meta

In the research paper we mentioned above, Meta also addresses the ethics of AI-generated music. They call the concerns surrounding it one of the reasons for their open-source approach. This way, MusicGen ensures that all players have equal access to the model. Developers write that they don’t want to create unfair competition for the artists.  

Through the development of more advanced controls, such as the melody conditioning we introduced, we hope that such models can become useful both to music amateurs and professionals.

A quote from the research paper

Music loops and when to avoid them

It seems that the launching of MusicGen by Meta makes the creation of short audio clips a piece of cake. Everyone can now imagine a melody, write a few words, click “generate”, and get a solid base for, say, a loop. However, how and when to use loops in your soundtrack is not an easy question at all and requires mastery. In our MZed-course “Cinema Sound”, the audio-guru Mark Edward Lewis has several hours of lessons only on how to choose the right music. He further delves into the reasons to use and avoid loops, and the most significant lesson I took away from this is that a wrongly chosen and placed melody can utterly ruin an actor’s performance.

MusicGen by Meta - working with created loops
Working with loops in the actual film scene. Image source: Mark Edward Lewis / MZed

Let’s take as an example this scene from the screen version of „Macbeth” above. Throughout the demonstration, Mark accompanies it with various music loops, but unfortunately, all the results turn out to be dreadful. But why? Because loops possess a powerful ability to smooth the ups and downs of emotions. They tend to give the viewers a feeling of time compression and slight suspense. The music elements repeat as you’re waiting, waiting, waiting for a twist, for something that is going to happen. That can be a useful tool when you need to underline a quiet action scene without dialogue, bring some tension into non-dynamic moments, or even give a scene a comedic touch. But loops are a definite no-go in the dramatic moments, as they crush the actors’ performances, flattening out the emotional arc of the scene.

That’s an important tip to consider while generating new loops for your video projects. If you want to learn more, head over to the “Cinema Sound” course, which consists of 85+ hours of engaging expert material on the topic.

You can try out MusicGen by Meta for yourself

As already mentioned, MusicGen by Meta is now open to the public. You can try generating your own music clips from text descriptions at the Hugging Face platform directly in the browser. It’s also possible to download the model’s code and execute it manually.

Have you already tried this mighty AI to generate some music? How would you estimate the results? And what is the next step in artificial intelligence development that you can’t wait to see? Tell us all about it in the comments below!

Full disclosure: MZed is owned by CineD.

Feature image source: created with Midjourney for CineD.


Notify of

Sort by:
Sort by:

Take part in the CineD community experience