--------------------------------------------------------------------------------- Silbo Gomero Speech Corpus --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- Description --------------------------------------------------------------------------------- This is a corpus of the Silbo Gomero whistled language, which is a whistled form of Spanish used on the La Gomera island. It was created from 49 minutes of raw recordings. The recordings contained read speech, and were produced by 4 fluent whistlers. They were created for use in teaching this language to children native to the island, and can be assumed to represent expert use of it. The corpus consists of 3 parts, each of which was made from the same data, edited in different ways; separate transcription file is provided for each part. - words.zip contains clips of single, separate words. Some clips may contain more than one word, in cases where the separation was not possible. - sentences.zip contains clips of entire sentences. Some parts of the recordings are not represented here; for example, one recording contained a poem, which could not be separated into sentences. - fragments.zip contains clips of short fragments of speech (on average, about 6.5 words long); those fragments were made by separating recordings where longer pauses between words occured. --------------------------------------------------------------------------------- Data Format --------------------------------------------------------------------------------- Audio files are encoded at 44100 Hz sampling rate, in the .wav format. The file names use the following convention: {Speaker_ID}-{'word'/'sentence'/'fragment'}-{utterance_ID}.wav --------------------------------------------------------------------------------- Attribution --------------------------------------------------------------------------------- This corpus was created by Agata Jakubiak, a student at the University of Warsaw. The recordings were provided by Francisco Javier Correa, working for the Silbo Gomero Teaching Project (Proyecto de Enseñanza de Silbo Gomero). They were recorded by Ana Luz Arteaga, Francisco Javier Correa, Juan Manuel Chinea and Silvia Martín, while working for the Silbo Gomero Teaching Project (Proyecto de Enseñanza de Silbo Gomero). You can cite the data using the following BibTeX entry: @inproceedings{jakubiak23_interspeech, author={Agata Jakubiak}, title={{Whistle-to-text: Automatic recognition of the Silbo Gomero whistled language}}, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={3402--3406}, doi={10.21437/Interspeech.2023-989} } --------------------------------------------------------------------------------- Licensing --------------------------------------------------------------------------------- This corpus is published under the CC BY-NC-SA 4.0 License (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode)