Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, and Gustav Eje Henter

This is the official code repository of Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis.

Demo Page: https://shivammehta25.github.io/Diff-TTSG/

Huggingface Space: https://huggingface.co/spaces/shivammehta25/Diff-TTSG

We present Diff-TTSG, the first diffusion model that jointly learns to synthesise speech and gestures together. Our method is probabilistic and non-autoregressive, and can be trained on small datasets from scratch. In addition, to showcase the efficacy of these systems and pave the way for their evaluation, we describe a set of careful uni- and multi-modal subjective tests for evaluating integrated speech and gesture synthesis systems.

Teaser (Clik the image to be redirected to the YouTube video)

Installation

Clone this repository

git clone https://github.com/shivammehta25/Diff-TTSG.git
cd Diff-TTSG

Create a new environment (optional)

conda create -n diff-ttsg python=3.10 -y
conda activate diff-ttsg

Setup diff ttsg (This will install all the dependencies and download the pretrained models)
- Is you are using Linux or Mac OS, run the following command
```
make install
```
- else install all dependencies and alignment build simply by
```
pip install -e .
```
Run gradio UI
```
gradio app.py
```

or use synthesis.ipynb

Pretrained checkpoint (Should be autodownloaded by running either make install or gradio app.py)

Citation information

If you use or build on our method or code for your research, please cite our paper:

@inproceedings{mehta2023diff,
  author={Mehta, Shivam and Wang, Siyang and Alexanderson, Simon and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
  title={{D}iff-{TTSG}: {D}enoising probabilistic integrated speech and gesture synthesis},
  year={2023},
  booktitle={Proc. ISCA Speech Synthesis Workshop (SSW)},
  pages={150--156},
  doi={10.21437/SSW.2023-24}
}

Acknowledgement

The code in the repository is heavily inspired by the source code of

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
configs		configs
diff_ttsg		diff_ttsg
notebooks		notebooks
pymo		pymo
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
Makefile		Makefile
README.md		README.md
app.py		app.py
data		data
environment.yaml		environment.yaml
g_02500000		g_02500000
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
synthesis.ipynb		synthesis.ipynb

shivammehta25/Diff-TTSG

Folders and files

Latest commit

History

Repository files navigation

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, and Gustav Eje Henter

Teaser (Clik the image to be redirected to the YouTube video)

Installation

Citation information

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages