This free Chinese Mandarin speech corpus set is released by Shanghai Primewords Information Technology Co., Ltd.

The corpus is recorded by smart mobile phones from 296 native Chinese speakers. The transcription accuracy is larger than 98%, at the confidence level of 95%. It is free for academic use.

The mapping between the transcript and utterance is given in JSON format.

You can cite the data using the following BibTeX entry:

    @misc{primewords_201801,
    title={Primewords Chinese Corpus Set 1},
    author={Primewords Information Technology Co., Ltd.},
    year={2018},
    note={\url{https://www.primewords.cn}}
    }

CONTACTOR Yinghui Liu, yinghui_liu@primewords.cn

External URLs: https://www.primewords.cn