Silbo Gomero Speech Corpus
Identifier: SLR137
Summary: Corpus of the Silbo Gomero whistled language, based on 49 minutes of recordings created by 4 whistlers.
Category: Speech
License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Downloads (use a mirror closer to you):
README.txt [3.0K] ( Readme file
) Mirrors:
[US]
[EU]
[CN]
words.zip [210M] ( Single-word clips with transcripts
) Mirrors:
[US]
[EU]
[CN]
fragments.zip [232M] ( Short fragments with transcripts
) Mirrors:
[US]
[EU]
[CN]
sentences.zip [86M] ( Whole sentences with transcripts
) Mirrors:
[US]
[EU]
[CN]
About this resource:
The corpus consists of 3 parts, each of which was made from the same data, edited in different ways; separate transcription file is provided for each part.
- 'words.zip' contains clips of single, separate words. Some clips may contain more than one word, in cases where the separation was not possible.
- 'sentences.zip' contains clips of entire sentences. Some parts of the recordings are not represented here; for example, one recording contained a poem, which could not be separated into sentences.
- 'fragments.zip' contains clips of short fragments of speech (on average, about 6.5 words long); those fragments were made by separating recordings where longer pauses between words occured.
This corpus was created by Agata Jakubiak, a student at University of Warsaw, from data provided by Francisco Javier Correa, working for the Silbo Gomero Teaching Project (Proyecto de EnseƱanza de Silbo Gomero), as a part of research into Automatic Speech Recognition of whistled speech.
You can cite the data using the following BibTeX entry:
@inproceedings{jakubiak23_interspeech, author={Agata Jakubiak}, title={{Whistle-to-text: Automatic recognition of the Silbo Gomero whistled language}}, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={3402--3406}, doi={10.21437/Interspeech.2023-989} }