Meta’s AudioCraft: AI Composing Realistic Music & Sounds

  • Meta announces AudioCraft, a powerful AI framework for generating realistic audio and music from text descriptions or prompts.
  • AudioCraft includes three generative AI models: MusicGen, AudioGen, and EnCodec, each serving different audio-related purposes.
  • Ethical and legal concerns arise as MusicGen learns from existing music, potentially leading to deepfake music and copyright issues. Meta emphasizes transparency but acknowledges biases in the training data.

The day is fast approaching when generative AI won’t only write and create images in a convincingly human-like style, but also compose music and sounds that pass for a professional’s work, too. This morning, tech giant Meta announced a groundbreaking framework called AudioCraft, capable of generating high-quality, realistic audio and music from short text descriptions or prompts.

AudioCraft represents a significant leap in AI-generated audio technology and builds upon Meta’s previous venture into audio generation with the AI-powered music generator, MusicGen, which was open-sourced in June. According to Meta, AudioCraft features advancements that greatly enhance the quality of AI-generated sounds, including barks of dogs, honking of cars, and footsteps on various surfaces.

The Components of AudioCraft

AudioCraft offers three distinct generative AI models, each serving different audio-related purposes:

MusicGenWhile MusicGen isn’t new, Meta has now released the training code for it, allowing users to train the model on their own music datasets. However, this raises ethical and legal concerns as MusicGen “learns” from existing music.
AudioGenThis diffusion-based model focuses on generating environmental sounds and sound effects. It can create “realistic recording conditions” and “complex scene content.”
EnCodecAn improved version of a previous Meta model, EnCodec is a lossy neural codec specifically designed for efficient audio compression and reconstruction.

Despite the remarkable capabilities of AudioCraft, the framework raises concerns about potential misuse and ethical dilemmas. MusicGen’s ability to learn from existing music and produce similar effects has led to debates about copyright infringement and the production of deepfake music. The issue becomes more complex when homemade tracks created with generative AI go viral and are flagged by music labels for intellectual property concerns.

Although Meta claims that the pretrained version of MusicGen was trained on specifically licensed music, questions remain regarding the model’s potential commercial applications. The lack of clarity on whether “deepfake” music violates copyright laws creates ambiguity for artists, labels, and other rights holders.

Transparency and Bias

In an effort to be more transparent, Meta has made efforts to clarify the data used to train their models. For instance, MusicGen’s training data consists of 20,000 hours of audio, including 400,000 recordings, along with text descriptions and metadata. Notably, vocals were removed from the training data to prevent the model from replicating artists’ voices. However, limitations in the training data have resulted in biases, as MusicGen does not perform well with non-English descriptions and non-Western musical styles.

Meta acknowledges the importance of transparency in model development and aims to make the models accessible to researchers and the music community. They hope that through the development of more advanced controls, such generative AI models can become useful tools for both music amateurs and professionals.

Future Prospects and Challenges

Meta’s AudioCraft represents a significant advancement in the field of AI-generated audio, with potential applications ranging from inspiring musicians to aiding in music composition. However, as the technology continues to evolve, striking a balance between innovation and responsibility becomes crucial.

Meta’s commitment to exploring ways to improve controllability and mitigate limitations and biases in generative audio models is commendable. Nevertheless, the music industry, researchers, and society as a whole must engage in a thoughtful and transparent discussion to navigate the potential challenges and ensure that AI-generated audio is used responsibly and ethically.

Visited 1 times, 1 visit(s) today

Stay ahead in the financial world – Sign Up to Rateweb’s essential newsletter for free. Get the latest insights on business trends, tech innovations, and market movements, directly to your inbox. Join our community of savvy readers and never miss an update that could impact your financial decisions.

Do you have a news tip for Rateweb reporters? Please email us at


Personal Financial Tools

Below is a list of tools built to assist South Africans to make the best financial decisions:



South Africa’s primary source of financial tools and information

Contact Us


Rateweb strives to keep its information accurate and up to date. This information may be different than what you see when you visit a financial institution, service provider or specific product’s site. All financial products, shopping products and services are presented without warranty. When evaluating offers, please review the financial institution’s Terms and Conditions.

Rateweb is not a financial service provider and should in no way be seen as one. In compiling the articles for our website due caution was exercised in an attempt to gather information from reliable and accurate sources. The articles are of a general nature and do not purport to offer specialised and or personalised financial or investment advice. Neither the author, nor the publisher, will accept any responsibility for losses, omissions, errors, fortunes or misfortunes that may be suffered by any person that acts or refrains from acting as a result of these articles.