Towards an end-to-end speech recognizer for Portuguese using deep neural networks

This repository contains the implementation of the SBRT 2017 paper entitled Towards an end-to-end speech recognizer for Portuguese using deep neural networks.

Training a character-based all-neural Brazilian Portuguese speech recognition model

The model was trained using four datasets: CSLU Spoltech (LDC2006S16), Sid, VoxForge, and LapsBM1.4. Only the CSLU dataset is paid.

Setting up the (partial) Brazilian Portuguese Speech Dataset (BRSD)

You can download the freely available datasets with the provided script (it may take a while):

$ cd data; sh download_datasets.sh

Next, you can preprocess it into an hdf5 file. Click here for more information.

$ python -m extras.make_dataset --parser brsd

Training the network

You can train the network with the main.py script. For more usage information see this. To train with the default parameters:

$ python main.py train --dataset .datasets/brsd/data.h5

Pre-trained model

You may download a pre-trained sbrt2017 over the full brsd dataset (including the CSLU dataset):

$ cd data; sh download_model.sh

Also, you can evaluate the model against the brsd test set

$ python main.py eval --model data/models/sbrt2017.h5 --dataset .datasets/brsd/data.h5

Requirements

Python 2.7
Numpy
Scipy
Pyyaml
HDF5
Unidecode
Librosa
Tensorflow
Keras

Acknowledgements

python_speech_features for the audio preprocessing
Google Magenta for the hparams
@robertomest for helping me with everything
SANTOS, S. C. B.; ALCAIM, A. "Reduced Sets of Subword Units for Continuous Speech Recognition of Portuguese". Electronics Letters, v.36, p.586 588, 2000.

License

See LICENSE for more information

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
datasets		datasets
extras		extras
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
callbacks.py		callbacks.py
main.py		main.py
model.py		model.py
preprocessing.py		preprocessing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

datasets

datasets

extras

extras

utils

utils

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

callbacks.py

callbacks.py

main.py

main.py

model.py

model.py

preprocessing.py

preprocessing.py

Repository files navigation

Towards an end-to-end speech recognizer for Portuguese using deep neural networks

Training a character-based all-neural Brazilian Portuguese speech recognition model

Setting up the (partial) Brazilian Portuguese Speech Dataset (BRSD)

Training the network

Pre-trained model

Requirements

Acknowledgements

License

About

Releases

Packages

Languages

License

igormq/sbrt2017

Folders and files

Latest commit

History

Repository files navigation

Towards an end-to-end speech recognizer for Portuguese using deep neural networks

Training a character-based all-neural Brazilian Portuguese speech recognition model

Setting up the (partial) Brazilian Portuguese Speech Dataset (BRSD)

Training the network

Pre-trained model

Requirements

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Languages