Open Speech and Language Resources



Silbo Gomero Speech Corpus

Identifier: SLR137

Summary: Corpus of the Silbo Gomero whistled language, based on 49 minutes of recordings created by 4 whistlers.

Category: Speech

License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Downloads (use a mirror closer to you):
README.txt [3.0K]   ( Readme file )   Mirrors: [US]   [EU]   [CN]  
words.zip [210M]   ( Single-word clips with transcripts )   Mirrors: [US]   [EU]   [CN]  
fragments.zip [232M]   ( Short fragments with transcripts )   Mirrors: [US]   [EU]   [CN]  
sentences.zip [86M]   ( Whole sentences with transcripts )   Mirrors: [US]   [EU]   [CN]  

About this resource:

This is a corpus of the Silbo Gomero whistled language, which is a whistled form of Spanish used on the La Gomera island. It was created from 49 minutes of raw recordings. The recordings contained read speech, and were produced by 4 fluent whistlers. They were created for use in teaching this language to children native to the island.

The corpus consists of 3 parts, each of which was made from the same data, edited in different ways; separate transcription file is provided for each part.

  • 'words.zip' contains clips of single, separate words. Some clips may contain more than one word, in cases where the separation was not possible.
  • 'sentences.zip' contains clips of entire sentences. Some parts of the recordings are not represented here; for example, one recording contained a poem, which could not be separated into sentences.
  • 'fragments.zip' contains clips of short fragments of speech (on average, about 6.5 words long); those fragments were made by separating recordings where longer pauses between words occured.

This corpus was created by Agata Jakubiak, a student at University of Warsaw, from data provided by Francisco Javier Correa, working for the Silbo Gomero Teaching Project (Proyecto de EnseƱanza de Silbo Gomero), as a part of research into Automatic Speech Recognition of whistled speech.

You can cite the data using the following BibTeX entry:

@inproceedings{jakubiak23_interspeech,
  author={Agata Jakubiak},
  title={{Whistle-to-text: Automatic recognition of the Silbo Gomero whistled language}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={3402--3406},
  doi={10.21437/Interspeech.2023-989}
}