ParlamentParla is a speech corpus for Catalan, published by the workers cooperative Col·lectivaT. The audio segments were extracted from recordings the Catalan Parliament Catalan Parliament (Parlament de Catalunya) plenary sessions. The recordings were aligned with their transcripts, and 320 hours of cleanest segments are extracted. The content belongs to the Catalan Parliament and the data is released conforming their terms of use.
Preparation of this corpus was supported by the Department of Culture of the Catalan autonomous government.
The audio files are PCM 16bit mono, little endian with the sample rate 16 kHz. As of release version 1.0, the corpus is separated into 90 hours of clean and 230 hours of other quality segments.
For contact info@collectivat.cat
https://collectivat.cat/asr The official ParlamentParla corpus webpage, with other resources and updates