What is AudioLM?

AudioLM represents a groundbreaking advancement in audio language modeling, focusing on the generation of high-fidelity, coherent speech and piano music without relying on text or symbolic representations. It arranges audio data hierarchically using two unique types of discrete tokens: semantic tokens, produced by a self-supervised model that captures phonetic and melodic elements alongside broader contextual information, and acoustic tokens, sourced from a neural codec that preserves speaker traits and detailed waveform characteristics. The architecture of this model features a sequence of three Transformer stages, starting with the semantic token prediction to form the structural foundation, proceeding to the generation of coarse tokens, and finishing with the fine acoustic tokens that facilitate intricate audio synthesis. As a result, AudioLM can effectively create seamless audio continuations from merely a few seconds of input, maintaining the integrity of voice identity and prosody in speech as well as the melody, harmony, and rhythm in musical compositions. Notably, human evaluations have shown that the audio outputs are often indistinguishable from genuine recordings, highlighting the remarkable authenticity and dependability of this technology. This innovation in audio generation not only showcases enhanced capabilities but also opens up a myriad of possibilities for future uses in various sectors like entertainment, telecommunications, and beyond, where the necessity for realistic sound reproduction continues to grow. The implications of such advancements could significantly reshape how we interact with and experience audio content in our daily lives.

Screenshots and Video

Company Facts

Company Name:
Google
Company Location:
United States
Company Website:
research.google/blog/audiolm-a-language-modeling-approach-to-audio-generation/

Product Details

Deployment
SaaS
Training Options
Documentation Hub
On-Site Training
Video Library
Support
Standard Support
Web-Based Support

Product Details

Target Company Sizes
Individual
1-10
11-50
51-200
201-500
501-1000
1001-5000
5001-10000
10001+
Target Organization Types
Mid Size Business
Small Business
Enterprise
Freelance
Nonprofit
Government
Startup
Supported Languages
English

AudioLM Categories and Features