This is a corpus of the Silbo Gomero whistled language, which is a whistled form of Spanish used on the La Gomera island. It was created from 49 minutes of raw recordings. The recordings contained read speech, and were produced by 4 fluent whistlers. They were created for use in teaching this language to children native to the island.

The corpus consists of 3 parts, each of which was made from the same data, edited in different ways; separate transcription file is provided for each part.

This corpus was created by Agata Jakubiak, a student at University of Warsaw, from data provided by Francisco Javier Correa, working for the Silbo Gomero Teaching Project (Proyecto de EnseƱanza de Silbo Gomero), as a part of research into Automatic Speech Recognition of whistled speech.

You can cite the data using the following BibTeX entry:

@inproceedings{jakubiak23_interspeech,
  author={Agata Jakubiak},
  title={{Whistle-to-text: Automatic recognition of the Silbo Gomero whistled language}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={3402--3406},
  doi={10.21437/Interspeech.2023-989}
}