Iban
Identifier: SLR24
Summary: Iban language text and speech corpora for ASR
Category: Speech
License: Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)
Downloads (use a mirror closer to you):
iban.tar.gz [913M] ( Iban language corpora
) Mirrors:
[US]
[EU]
[CN]
About this resource:
INTRODUCTION
This package contains Iban language text and speech suitable for Automatic Speech Recognition (ASR) experiments. In addition to transcribed speech, 2M tokens corpus crawled from an online newspaper sites is provided. News data provided by a local radio station in Sarawak, Malaysia.PUBLICATION ON IBAN DATA AND ASR
Details on the corpora and the our experiments on iban ASR can be found in the following list of publication. We appreciate if you cite them if you intend to publish.@inproceedings{Juan14, Author = {Sarah Samson Juan and Laurent Besacier and Solange Rossato}, Booktitle = {Proceedings of Workshop for Spoken Language Technology for Under-resourced (SLTU)}, Month = {May}, Title = {Semi-supervised G2P bootstrapping and its application to ASR for a very under-resourced language: Iban}, Year = {2014}} @inproceedings{Juan2015, Title = {Using Resources from a closely-Related language to develop ASR for a very under-resourced Language: A case study for Iban}, Author = {Sarah Samson Juan and Laurent Besacier and Benjamin Lecouteux and Mohamed Dyab}, Booktitle = {Proceedings of INTERSPEECH}, Year = {2015}, Address = {Dresden, Germany}, Month = {September}}
Original source of the corpus
This OpenSLR release was created from data originally provided by Sarah Juan, but the format was changed to better fit the Kaldi practices. Some of the files were removed, as they are generated now automatically in the Kaldi Iban recipe. The original source of the corpus ishttps://github.com/sarahjuan/ibanSee the README there for more details, most of it still applies.