1111 Hours Hindi ASR Challenge 2022 - A challenge on Automatic Speech Recognition for Hindi by sharing the spontaneous telephone speech recordings from a social technology enterprise Gram Vaani, in regional variations of Hindi. The regional variations of Hindi together with spontaneity of speech, natural background and transcriptions with varying degrees of accuracy due to crowd sourcing make it a unique corpus for automatic recognition of spontaneous telephone speech.
The data set comprises of telephone quality speech data in Hindi. We will be releasing approximately 1000 hours of unlabelled data and 105 hours of labelled speech data through this challenge. The details of the data sets released for this challenge are as follows: 1) Train set - 100 hours (labeled) 2) Development set - 5 hours (labeled) 3) 1000 hours of unlabelled data
Gramvaani data has .mp3 files with mix of sampling rates from 8KHz to 48KHZ. Following table shows the sampling rate distribution in the Train&Development, and unlabeled 1000 hours datasets.
Frequency | Percentage distribution in the train and dev dataset | Percentage distribution in the unlabeled 1000hr dataset |
8KHz | 60.87% | 67.63% |
16KHz | 0.84% | 0.66% |
22KHz | 0.00% | 3.01% |
24KHz | 0.00% | 0.08% |
32KHz | 0.25% | 0.26% |
44KHz | 34.46% | 25.45% |
48KHz | 3.56% | 2.87% |
Baseline results and scripts can be found here (https://github.com/anish9208/gramvaani_hindi_asr#gramvaani_hindi_asr).