Hi-Fi Multi-Speaker English TTS Dataset (Hi-Fi TTS)
Identifier: SLR109
Summary: A multi-speaker English dataset for training text-to-speech models
Category: Speech
License: CC BY 4.0
Downloads (use a mirror closer to you):
hi_fi_tts_v0.tar.gz [41G] ( Speech and text data
) Mirrors:
[US]
[EU]
[CN]
About this resource:
Hi-Fi Multi-Speaker English TTS Dataset (Hi-Fi TTS) is a multi-speaker English dataset for training text-to-speech models. The dataset is based on public audiobooks from LibriVox and texts from Project Gutenberg.
The Hi-Fi TTS dataset contains about 291.6 hours of speech from 10 speakers with at least 17 hours per speaker sampled at 44.1 kHz.
For more information and the latest dataset statistics, please refer to the paper: "Hi-Fi Multi-Speaker English TTS Dataset" Bakhturina, E., Lavrukhin, V., Ginsburg, B. and Zhang, Y., 2021: arxiv.org/abs/2104.01497.
BibTeX entry for citations:
@article{bakhturina2021hi, title={{Hi-Fi Multi-Speaker English TTS Dataset}}, author={Bakhturina, Evelina and Lavrukhin, Vitaly and Ginsburg, Boris and Zhang, Yang}, journal={arXiv preprint arXiv:2104.01497}, year={2021} }