After being given a few seconds of audio input, a new AI system can produce speech and music that sound natural.
AudioLM creates Music
With nearly little audible difference from the original recording, Google researchers’ AudioLM creates audio that matches the prompt’s style, including complex sounds like piano music or people chatting.
The method has the potential to accelerate the process of teaching AI to produce audio, and it may one day be used to automatically create music to go with videos.
Natural language processing is used in AI-generated voices on home assistants like Alexa, which are widely used.
Although amazing achievements have previously been achieved with AI music systems like OpenAI’s Jukebox, the majority of currently used methods require humans to create transcriptions and label text-based training data, which takes a lot of time and effort. Jukebox, for instance, generates song lyrics using text-based information.
The non-transcriptional, label-free AudioLM system was recently reported in a non-peer-reviewed publication.
Instead, sound databases are input into the computer, and machine learning is used to compress the audio files into short sound clips called “tokens” without substantially sacrificing any of the original audio’s quality.
A machine-learning model that makes use of natural language processing to learn the patterns of the sound is then given this tokenized training data.
A few seconds of sound are given into AudioLM to create the audio, and it then foretells what will happen next.
The method is comparable to how language models like GPT-3 anticipate the normal order of sentences and words.
The team’s audio samples have a rather natural sound to them. Piano music created with AudioLM, in contrast to piano music created with previous AI approaches, which tends to sound chaotic, sounds more flowing.
According to Carnegie Mellon University’s Roger Dannenberg, who studies computer-generated music, AudioLM already has far better sound quality than earlier music creation software.
He claims that AudioLM is surprisingly effective at recreating some of the repetitive rhythms found in music created by humans.
The delicate vibrations that are included in each note when piano keys are struck must be captured by AudioLM in great detail in order to produce authentic piano music. The music must be able to maintain its harmonies and rhythms over time.
“That’s really impressive, partly because it indicates that they are learning some kinds of structure at multiple levels,” Dannenberg says
To read our blog on “Those without hearing may listen to music. Learn How?” click here.