Open Speech and Language Resources



CN-Celeb

Identifier: SLR82

Summary: A Free Chinese Speaker Recognition Corpus Released by CSLT@Tsinghua University

Category: Speech

License: Attribution-ShareAlike 4.0 International

Downloads (use a mirror closer to you):
cn-celeb_v2.tar.gz [22G]   ( Updated CN-Celeb1 and correct genre and speaker labels )   Mirrors: [US]   [EU]   [CN]  
cn-celeb2_v2.tar.gzaa [26G]   ( Updated CN-Celeb2 and correct genre and speaker labels (1/3) )   Mirrors: [US]   [EU]   [CN]  
cn-celeb2_v2.tar.gzab [26G]   ( Updated CN-Celeb2 and correct genre and speaker labels (2/3) )   Mirrors: [US]   [EU]   [CN]  
cn-celeb2_v2.tar.gzac [23G]   ( Updated CN-Celeb2 and correct genre and speaker labels (3/3) )   Mirrors: [US]   [EU]   [CN]  

About this resource:

This is a large-scale speaker recognition dataset collected 'in the wild'. The dataset consists of two subsets, CN-Celeb1 and CN-Celeb2. All the audio files are coded as single channel and sampled at 16kHz with 16-bit precision. For CN-Celeb1, it contains more than 130,000 utterances from 1,000 Chinese celebrities, and covers 11 different genres in real world. For CN-Celeb2, it contains more than 520,000 utterances from 2,000 Chinese celebrities, and covers 11 different genres in real world. The data collection process was organized by the Center for Speech and Language Technologies, Tsinghua University. It was also funded by the National Natural Science Foundation of China No. 61633013, and the Postdoctoral Science Foundation of China No. 2018M640133. You can cite the data using the following BibTeX entry:
@inproceedings{fan2020cn,
  title={CN-CELEB: a challenging Chinese speaker recognition dataset},
  author={Fan, Yue and Kang, JW and Li, LT and Li, KC and Chen, HL and Cheng, ST and Zhang, PY and Zhou, ZY and Cai, YQ and Wang, Dong},
  booktitle={ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7604--7608},
  year={2020},
  organization={IEEE}
}

@misc{li2020cn,
  title={CN-Celeb: multi-genre speaker recognition},
  author={Lantian Li and Ruiqi Liu and Jiawen Kang and Yue Fan and Hao Cui and Yunqi Cai and Ravichander Vipperla and Thomas Fang Zheng and Dong Wang},
  year={2020},
  eprint={2012.12468},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
 }

People:

Dong Wang, Yue Fan, Hao Cui, Jiawen Kang, Lantian Li, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang, Ziya Zhou, Yang Zhang, Yunqi Cai

Contact:

Address: ROOM 1-303, BLDG FIT, CSLT, Tsinghua University

Homepage: http://cslt.org or http://cslt.riit.tsinghua.edu.cn