Load audio data

Audio datasets are loaded from the audio column, which contains three important fields:

array: the decoded audio data represented as a 1-dimensional array.
path: the path to the downloaded audio file.
sampling_rate: the sampling rate of the audio data.

To work with audio datasets, you need to have the audio dependency installed. Check out the installation guide to learn how to install it.

When you load an audio dataset and call the audio column, the [Audio] feature automatically decodes and resamples the audio file:

>>> from datasets import load_dataset, Audio

>>> dataset = load_dataset("PolyAI/minds14", "en-US", split="train")
>>> dataset[0]["audio"]
{'array': array([ 0.        ,  0.00024414, -0.00024414, ..., -0.00024414,
         0.        ,  0.        ], dtype=float32),
 'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~JOINT_ACCOUNT/602ba55abb1e6d0fbce92065.wav',
 'sampling_rate': 8000}

Index into an audio dataset using the row index first and then the audio column - dataset[0]["audio"] - to avoid decoding and resampling all the audio files in the dataset. Otherwise, this can be a slow and time-consuming process if you have a large dataset.

For a guide on how to load any type of dataset, take a look at the general loading guide.

Local files

The path is useful for loading your own dataset. Use the [~Dataset.cast_column] function to take a column of audio file paths, and decode it into array's with the [Audio] feature:

>>> audio_dataset = Dataset.from_dict({"audio": ["path/to/audio_1", "path/to/audio_2", ..., "path/to/audio_n"]}).cast_column("audio", Audio())

If you only want to load the underlying path to the audio dataset without decoding the audio file into an array, set decode=False in the [Audio] feature:

>>> dataset = load_dataset("PolyAI/minds14", "en-US", split="train").cast_column("audio", Audio(decode=False))
>>> dataset[0]
{'audio': {'bytes': None,
  'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~JOINT_ACCOUNT/602ba55abb1e6d0fbce92065.wav'},
 'english_transcription': 'I would like to set up a joint account with my partner',
 'intent_class': 11,
 'lang_id': 4,
 'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~JOINT_ACCOUNT/602ba55abb1e6d0fbce92065.wav',
 'transcription': 'I would like to set up a joint account with my partner'}

AudioFolder

You can also load a dataset with an AudioFolder dataset builder. It does not require writing a custom dataloader, making it useful for quickly loading audio data.

AudioFolder with metadata

To link your audio files with metadata information, make sure your dataset has a metadata.csv file. Your dataset structure might look like:

folder/train/metadata.csv
folder/train/first_audio_file.mp3
folder/train/second_audio_file.mp3
folder/train/third_audio_file.mp3

Your metadata.csv file must have a file_name column which links audio files with their metadata. An example metadata.csv file might look like:

file_name,transcription
first_audio_file.mp3,znowu się duch z ciałem zrośnie w młodocianej wstaniesz wiosnie i możesz skutkiem tych leków umierać wstawać wiek wieków dalej tam były przestrogi jak siekać głowę jak nogi
second_audio_file.mp3,już u źwierzyńca podwojów król zasiada przy nim książęta i panowie rada a gdzie wzniosły krążył ganek rycerze obok kochanek król skinął palcem zaczęto igrzysko
third_audio_file.mp3,pewnie kędyś w obłędzie ubite minęły szlaki zaczekajmy dzień jaki poślemy szukać wszędzie dziś jutro pewnie będzie posłali wszędzie sługi czekali dzień i drugi gdy nic nie doczekali z płaczem chcą jechać dali

Metadata can also be specified as JSON Lines, in which case use metadata.jsonl as the name of the metadata file. This format is helpful in scenarios when one of the columns is complex, e.g. a list of floats, to avoid parsing errors or reading the complex values as strings.

Load your audio dataset by specifying audiofolder and the directory containing your data in data_dir:

>>> from datasets import load_dataset

>>> dataset = load_dataset("audiofolder", data_dir="/path/to/folder")

AudioFolder will load audio data and create a transcription column containing texts from metadata.csv:

>>> dataset["train"][0]
{'audio':
    {'path': '/path/to/extracted/audio/first_audio_file.mp3',
    'array': array([ 0.00088501,  0.0012207 ,  0.00131226, ..., -0.00045776, -0.00054932, -0.00054932], dtype=float32),
    'sampling_rate': 16000},
 'transcription': 'znowu się duch z ciałem zrośnie w młodocianej wstaniesz wiosnie i możesz skutkiem tych leków umierać wstawać wiek wieków dalej tam były przestrogi jak siekać głowę jak nogi'
}

You can load remote datasets from their URLs with the data_files parameter:

>>> dataset = load_dataset("audiofolder", data_files="https://s3.amazonaws.com/datasets.huggingface.co/SpeechCommands/v0.01/v0.01_test.tar.gz")

AudioFolder with labels

If your data directory doesn't contain any metadata files, by default AudioFolder automatically adds a label column of [~datasets.features.ClassLabel] type, with labels based on the directory name. It might be useful if you have an audio classification task.

Language identification

Language identification datasets have audio recordings of speech in multiple languages:

folder/train/ar/0197_720_0207_190.wav
folder/train/ar/0179_830_0185_540.mp3
folder/train/ar/0179_830_0185_540.mp3

folder/train/zh/0442_690_0454_380.mp3

As there are no metadata files, AudioFolder will create a label column with the language id based on the directory name:

>>> dataset = load_dataset("audiofolder", data_dir="/path/to/folder", drop_labels=False)
>>> dataset["train"][0]
{'audio':
    {'path': '/path/to/extracted/audio/0197_720_0207_190.mp3',
    'array': array([-3.6621094e-04, -6.1035156e-05,  6.1035156e-05, ..., -5.1879883e-04, -1.0070801e-03, -7.6293945e-04],
    'sampling_rate': 16000}
 'label': 0  # "ar"
}

>>> dataset["train"][-1]
{'audio':
    {'path': '/path/to/extracted/audio/0442_690_0454_380.mp3',
    'array': array([1.8920898e-03, 9.4604492e-04, 1.9226074e-03, ..., 9.1552734e-05, 1.8310547e-04, 6.1035156e-05],
    'sampling_rate': 16000}
 'label': 99  # "zh"
}

If you have metadata files inside your data directory, but you still want to infer labels from directories names, set drop_labels=False as defined in [~datasets.packaged_modules.audiofolder.AudioFolderConfig].

Alternatively, you can add label column to your metadata.csv file.

If you have no metadata files and want to drop automatically created labels, set drop_labels=True. In this case your dataset would contain only an audio column.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audio_load.mdx

audio_load.mdx

Load audio data

Local files

AudioFolder

AudioFolder with metadata

AudioFolder with labels

Language identification

Files

audio_load.mdx

Latest commit

History

audio_load.mdx

File metadata and controls

Load audio data

Local files

AudioFolder

AudioFolder with metadata

AudioFolder with labels

Language identification