Tree-Constrained Pointer Generator (TCPGen) for Whisper Biasing

Whisper Biasing

End-to-end automatic speech recognition (ASR) and large language models, such as Whisper and GPT-2, have recently been scaled to use vast amounts of training data. Despite a large amount of training data, infrequent content words that occur in a particular task may still exhibit poor ASR performance, with contextual biasing a possible remedy. This paper investigates the effectiveness of neural contextual biasing for Whisper combined with GPT-2. Specifically, this paper proposes integrating an adapted tree-constrained pointer generator (TCPGen) component for Whisper and a dedicated training scheme to dynamically adjust the final output without modifying any Whisper model parameters. Experiments across three datasets show a considerable reduction in errors on biasing words with a biasing list of 1000 words. Contextual biasing was more effective when applied to domain-specific data and can boost the performance of Whisper and GPT-2 without losing their generality.

Dependencies

All required packages for Whisper

Data and biasing list preparation

We use LibriSpeech as an example, but this can be applied to SLURP and DSTC as well.

Dump features

cd data/LibriSpeech
python dump_feature.py

Note that you need to change setname='train-clean-100' to the set you want.

Biasing lists Biasing lists are already prepared:

rareword_error.txt: error-based biasing list for training

all_rare_words.txt: full biasing list for inference

Use get_rarewords.py to get JSON data files containing per-utterance biasing words, e.g. train_clean_100_error.json which is used for training.

Training

run training script train_large.sh for training.

Decoding

run decoding script decoding.sh for decoding.

Scoring

score with score.sh after decoding. Use error_analysis/get_error_word_count.py to calculate R-WER, by passing <path_to_results.txt> as the argument to it.

Expected results (test-clean)

System	WER	R-WER
Whisper large unnormalised	4.0%	10.4%
Whisper large + TCPGen unnormalised	3.4%	8.3%
Whisper large normalised	2.5%	8.1%
Whisper large + TCPGen normalised	2.3%	7.0%

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.github/workflows		.github/workflows
data		data
error_analysis		error_analysis
notebooks		notebooks
tests		tests
whisper		whisper
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
approach.png		approach.png
dataloader.py		dataloader.py
decode.py		decode.py
decoding.sh		decoding.sh
language-breakdown.svg		language-breakdown.svg
model-card.md		model-card.md
requirements.txt		requirements.txt
score.sh		score.sh
setup.py		setup.py
train.py		train.py
train.sh		train.sh
train_large.sh		train_large.sh

License

BriansIDP/WhisperBiasing

Folders and files

Latest commit

History

Repository files navigation

Tree-Constrained Pointer Generator (TCPGen) for Whisper Biasing

Whisper Biasing

Dependencies

Data and biasing list preparation

Training

Decoding

Scoring

Expected results (test-clean)

About

Resources

License

Stars

Watchers

Forks

Languages