Open Speech and Language Resources



NICT-Tib1

Identifier: SLR158

Summary: 33.5-hour Lhasa-Tibetan read-speech corpus with Kaldi-style transcripts

Category: Speech

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Downloads (use a mirror closer to you):
tibetan.zip [3.0G]   ( Tibetan read speech audio + transcripts (33.5 h, 20 speakers) )   Mirrors: [US]   [EU]   [CN]  

About this resource:

NICT-Tib1: Lhasa-Tibetan Read-Speech Corpus (v1.0, released 2024-08-27)

NICT-Tib1 is an open, CC BY 4.0-licensed audio corpus intended for developing and benchmarking automatic speech recognition (ASR) systems for Tibetan. It contains 33.5 hours of clean read speech recorded in studio conditions from 20 native speakers of the Lhasa dialect (8 male, 12 female, aged 15–30). Speakers read news manuscripts aloud; each utterance is provided with a Kaldi-format transcription (wav.scp, label.txt) so the data can serve both as training and test material.

Package contents

  • Tibetan.zip (~3 GB)
    • 16-kHz, 16-bit mono .wav files (one per utterance)
    • wav.scp – Kaldi mapping of utterance IDs to audio paths
    • label.txt – Kaldi transcription file (UTF-8 Tibetan script)
    • Per-speaker directory structure: data/<spk-id>/<session-id>/
    • README (collection protocol, microphone setup, segment duration statistics)

Licence

All audio and transcripts are distributed under the Creative Commons Attribution 4.0 International licence. You are free to use, share and adapt the material provided appropriate credit is given.

Citation

Please cite the following paper when using the corpus:

@inproceedings{nict-tib1,
  title     = {{NICT-Tib1: A Public Speech Corpus of Lhasa Dialect for Benchmarking Tibetan Language Speech Recognition Systems}},
  author    = {Kak Soky and Zhuo Gong and Sheng Li},
  booktitle = {Proc. O-COCOSDA},
  pages     = {1--5},
  year      = {2022},
  doi       = {10.1109/O-COCOSDA202257103.2022.9997917}
}

Questions and feedback can be sent to the corpus maintainers via the contact information on the NICT release page.

External URL: https://ast-astrec.nict.go.jp/en/release/NICT-Tib1/   Official release page