LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate, prepared by Heiga Zen with the assistance of Google Speech and Google Brain team members. The LibriTTS corpus is designed for TTS research. It is derived from the original materials (mp3 audio files from LibriVox and text files from Project Gutenberg) of the LibriSpeech corpus. The main differences from the LibriSpeech corpus are listed below:
  1. The audio files are at 24kHz sampling rate.
  2. The speech is split at sentence breaks.
  3. Both original and normalized texts are included.
  4. Contextual information (e.g., neighbouring sentences) can be extracted.
  5. Utterances with significant background noise are excluded.
For more information, refer to the paper "LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech", Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, and Yonghui Wu, arXiv, 2019. If you use the LibriTTS corpus in your work, please cite this paper where it was introduced.

The MD5 checksums of the downloads are as follows (note: not everyone will want to know this).


0c3076c1e5245bb3f0af7d82087ee207  dev-clean.tar.gz
815555d8d75995782ac3ccd7f047213d  dev-other.tar.gz
7bed3bdb047c4c197f1ad3bc412db59f  test-clean.tar.gz
ae3258249472a13b5abef2a816f733e4  test-other.tar.gz
4a8c202b78fe1bc0c47916a98f3a2ea8  train-clean-100.tar.gz
a84ef10ddade5fd25df69596a2767b2d  train-clean-360.tar.gz
7b181dd5ace343a5f38427999684aa6f  train-other-500.tar.gz