NICT-Tib1
Identifier: SLR158
Summary: 33.5-hour Lhasa-Tibetan read-speech corpus with Kaldi-style transcripts
Category: Speech
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Downloads (use a mirror closer to you):
tibetan.zip [3.0G] ( Tibetan read speech audio + transcripts (33.5 h, 20 speakers)
) Mirrors:
[US]
[EU]
[CN]
About this resource:
NICT-Tib1: Lhasa-Tibetan Read-Speech Corpus (v1.0, released 2024-08-27)
NICT-Tib1 is an open, CC BY 4.0-licensed audio corpus intended for developing
and benchmarking automatic speech recognition (ASR) systems for Tibetan.
It contains 33.5 hours of clean read speech recorded in studio conditions
from 20 native speakers of the Lhasa dialect (8 male, 12 female, aged 15–30).
Speakers read news manuscripts aloud; each utterance is provided with a
Kaldi-format transcription (wav.scp
, label.txt
) so the data
can serve both as training and test material.
Package contents
Tibetan.zip
(~3 GB)- 16-kHz, 16-bit mono
.wav
files (one per utterance) wav.scp
– Kaldi mapping of utterance IDs to audio pathslabel.txt
– Kaldi transcription file (UTF-8 Tibetan script)- Per-speaker directory structure:
data/<spk-id>/<session-id>/
- README (collection protocol, microphone setup, segment duration statistics)
- 16-kHz, 16-bit mono
Licence
All audio and transcripts are distributed under the Creative Commons Attribution 4.0 International licence. You are free to use, share and adapt the material provided appropriate credit is given.
Citation
Please cite the following paper when using the corpus:
@inproceedings{nict-tib1, title = {{NICT-Tib1: A Public Speech Corpus of Lhasa Dialect for Benchmarking Tibetan Language Speech Recognition Systems}}, author = {Kak Soky and Zhuo Gong and Sheng Li}, booktitle = {Proc. O-COCOSDA}, pages = {1--5}, year = {2022}, doi = {10.1109/O-COCOSDA202257103.2022.9997917} }
Questions and feedback can be sent to the corpus maintainers via the contact information on the NICT release page.
External URL: https://ast-astrec.nict.go.jp/en/release/NICT-Tib1/ Official release page