Kallaama
Identifier: SLR151
Summary: Wolof, Pulaar and Sereer data
Category: Speech
License: Creative Commons Attribution 4.0 International (CC-BY-4.0)
Downloads (use a mirror closer to you):
speech_dataset_wol.tar.gz [5.4G] ( Wolof speech and transcripts
) Mirrors:
[US]
[EU]
[CN]
speech_dataset_fuc.tar.gz [3.1G] ( Pulaar speech and transcripts
) Mirrors:
[US]
[EU]
[CN]
speech_dataset_srr.tar.gz [3.9G] ( Sereer speech and transcripts
) Mirrors:
[US]
[EU]
[CN]
About this resource:
This work is a result of the Kallaama project, funded by Lacuna Fund for 1 year, in 2023. The recordings are about agriculture. The recorded consist of farmers, agricultural advisers, and agri-food business managers. Type of recordings comprise interactive radio programmes, focus groups, voice messages, push messages and interviews. Therefore, spontaneous speech is prevailing. Quality of audio may vary depending on the type of programme.
-
speech_dataset_wol.tar.gz: Wolof (ISO Code 639-2: wol) speech dataset contains 55 hours of transcribed speech, including almost 13 hours of validated content check by an expert. It also contains a XSAMPA lexicon (49,132 phonetised entries) and a text corpus (1,140,508 words).
speech_dataset_fuc.tar.gz: Pulaar (ISO Code 639-2: fuc) speech dataset contains nearly 32 hours of transcribed speech, including around 11 hours of validated content check by an expert. It also contains a text corpus (742,024 words).
speech_dataset_srr.tar.gz: Sereer (ISO Code 639-2: srr) speech dataset contains 38 hours of transcribed speech, including nearly 11 hours of validated content check by an expert.
These resources along with the collection methodology, as well as a description of the Kallaama project is published in the following paper (please cite this paper if you publish work using theses resources, by using the following BibTeX entry):
@inproceedings{kallaama2024dataset, title={Kallaama: A Transcribed Speech Dataset about Agriculture in the Three Most Widely Spoken Languages in Senegal} author={Gauthier, Elodie and Ndiaye, Aminata and Guissé, Abdoulaye} booktitle={Proceedings of the Fifth workshop on Resources for African Indigenous Languages (RAIL 2024)}, year={2024} }
External URLs: https://github.com/gauthelo/kallaama-speech-dataset