openslr.org

Open Speech and Language Resources

CHiME-6

Identifier: SLR150

Summary: English multi-channel far field meeting data used in the CHiME-6 Challenge. It is derived from CHiME-5 by fixing some array synchronization errors.

Category: Speech

License: Attribution-ShareAlike 4.0 (CC BY-SA 4.0 International)

Downloads (use a mirror closer to you):
CHiME6_train.tar.gz [97G] ( CHiME-6 training portion ) Mirrors: [US] [EU] [CN]
CHiME6_dev.tar.gz [11G] ( CHiME-6 development portion ) Mirrors: [US] [EU] [CN]
CHiME6_eval.tar.gz [12G] ( CHiME-6 evaluation portion ) Mirrors: [US] [EU] [CN]
CHiME6_transcriptions.tar.gz [2.4M] ( CHiME-6 JSON annotation transcriptions ) Mirrors: [US] [EU] [CN]
CHiME6_floorplans.tar.gz [1.4M] ( CHiME-6 floorplans for each session ) Mirrors: [US] [EU] [CN]
LICENSE.txt [20K] ( CHiME-5 CC BY-SA 4.0 license ) Mirrors: [US] [EU] [CN]

About this resource:

CHiME-6 dataset as used in the CHiME-6 Challenge in 2020 and CHiME-7 DASR task in 2023.
It is derived from CHiME-5 by running this array synchronization script. More details are available in:

According to the dataset license, you should cite this dataset using the following BibTeX entries:


@inproceedings{barker18_interspeech,
  author={Jon Barker and Shinji Watanabe and Emmanuel Vincent and Jan Trmal},
  title={{The Fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, Task and Baselines}},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1561--1565},
  doi={10.21437/Interspeech.2018-1768}
}


@inproceedings{watanabe2020chime,
  title={CHiME-6 Challenge: Tackling multispeaker speech recognition for unsegmented recordings},
  author={Watanabe, Shinji and Mandel, Michael and Barker, Jon and Vincent, Emmanuel and Arora, Ashish and Chang, Xuankai and Khudanpur, Sanjeev and Manohar, Vimal and Povey, Daniel and Raj, Desh and others},
  booktitle={CHiME 2020-6th International Workshop on Speech Processing in Everyday Environments},
  year={2020}
}