This is a corpus of the Silbo Gomero whistled language, which is a whistled form of Spanish used on the La Gomera island. It was created from 49 minutes of raw recordings. The recordings contained read speech, and were produced by 4 fluent whistlers. They were created for use in teaching this language to children native to the island.
The corpus consists of 3 parts, each of which was made from the same data, edited in different ways; separate transcription file is provided for each part.
This corpus was created by Agata Jakubiak, a student at University of Warsaw, from data provided by Francisco Javier Correa, working for the Silbo Gomero Teaching Project (Proyecto de EnseƱanza de Silbo Gomero), as a part of research into Automatic Speech Recognition of whistled speech.
You can cite the data using the following BibTeX entry:
@inproceedings{jakubiak23_interspeech, author={Agata Jakubiak}, title={{Whistle-to-text: Automatic recognition of the Silbo Gomero whistled language}}, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={3402--3406}, doi={10.21437/Interspeech.2023-989} }