Hi-Fi Multi-Speaker English TTS Dataset (Hi-Fi TTS) is a multi-speaker English dataset for training text-to-speech models. The dataset is based on public audiobooks from LibriVox and texts from Project Gutenberg.

The Hi-Fi TTS dataset contains about 291.6 hours of speech from 10 speakers with at least 17 hours per speaker sampled at 44.1 kHz.

For more information and the latest dataset statistics, please refer to the paper: "Hi-Fi Multi-Speaker English TTS Dataset" Bakhturina, E., Lavrukhin, V., Ginsburg, B. and Zhang, Y., 2021: arxiv.org/abs/2104.01497.

BibTeX entry for citations:

@article{bakhturina2021hi,
  title={{Hi-Fi Multi-Speaker English TTS Dataset}},
  author={Bakhturina, Evelina and Lavrukhin, Vitaly and Ginsburg, Boris and Zhang, Yang},
  journal={arXiv preprint arXiv:2104.01497},
  year={2021}
}