Open Speech and Language Resources



Crowdsourced high-quality UK and Ireland English Dialect speech data set.

Identifier: SLR83

Summary: Data set which contains male and female recordings of English from various dialects of the UK and Ireland.

Category: Speech

License: Attribution-ShareAlike 4.0 International

Downloads (use a mirror closer to you):
about.html [1.8K]   (Information about the data set )   Mirrors: [US]   [EU]   [CN]  
LICENSE [20K]   (License information for the data set )   Mirrors: [US]   [EU]   [CN]  
line_index_all.csv [1.9M]   (All utterances in the data set. )   Mirrors: [US]   [EU]   [CN]  
dialect_info.txt [1.2K]   (Information about the dialects represented in the data )   Mirrors: [US]   [EU]   [CN]  
irish_english_male.zip [164M]   (Archive file with recordings from the speakers of )   Mirrors: [US]   [EU]   [CN]  
midlands_english_female.zip [103M]   (Archive file with recordings from the female )   Mirrors: [US]   [EU]   [CN]  
midlands_english_male.zip [166M]   (Archive file with recordings from the male )   Mirrors: [US]   [EU]   [CN]  
northern_english_female.zip [314M]   (Archive file with recordings from the female )   Mirrors: [US]   [EU]   [CN]  
northern_english_male.zip [817M]   (Archive file with recordings from the male )   Mirrors: [US]   [EU]   [CN]  
scottish_english_female.zip [351M]   (Archive file with recordings from the female )   Mirrors: [US]   [EU]   [CN]  
scottish_english_male.zip [620M]   (Archive file with recordings from the male )   Mirrors: [US]   [EU]   [CN]  
southern_english_female.zip [1.6G]   (Archive file with recordings from the female )   Mirrors: [US]   [EU]   [CN]  
southern_english_male.zip [1.7G]   (Archive file with recordings from the male )   Mirrors: [US]   [EU]   [CN]  
welsh_english_female.zip [595M]   (Archive file with recordings from the female )   Mirrors: [US]   [EU]   [CN]  
welsh_english_male.zip [757M]   (Archive file with recordings from the male Welsh )   Mirrors: [US]   [EU]   [CN]  

About this resource:

This data set contains transcribed high-quality audio of English sentences recorded by volunteers speaking different dialects of the language. The data set consists of wave files, and a TSV file (line_index.tsv). The file line_index.csv contains a line id, an anonymized FileID and the transcription of audio in the file. The recordings from the Welsh English speakers were collected in collaboration with Cardiff University. The data set contains the following number of lines:
Irish English male: 450
Midlands English female: 246
Midlands English male: 450
Northern English female: 750
Northern English male: 2097
Scottish English female: 894
Scottish English male: 1649
Southern English female: 4161
Southern English male: 4331
Welsh English female: 1199
Welsh English male: 1650

The data set has been manually quality checked, but there might still be errors.

Please report any issues in the following issue tracker on GitHub. https://github.com/googlei18n/language-resources/issues

See LICENSE file for license information.

Copyright 2018, 2019 Google, Inc.

If you use this data in publications, please cite it as follows:

  @inproceedings{demirsahin-etal-2020-open,
    title = {{Open-source Multi-speaker Corpora of the English Accents in the British Isles}},
    author = {Demirsahin, Isin and Kjartansson, Oddur and Gutkin, Alexander and Rivera, Clara},
    booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)},
    month = may,
    year = {2020},
    pages = {6532--6541},
    address = {Marseille, France},
    publisher = {European Language Resources Association (ELRA)},
    url = {https://www.aclweb.org/anthology/2020.lrec-1.804},
    ISBN = {979-10-95546-34-4},
  }