This is public domain speech dataset consisting of 24018 short audio clips of a single speaker reading sentences in Polish. A transcription is provided for each clip. Clips have total length of more than 22 hours.
Texts are in public domain. The audio was recorded in 2021-22 as a part of my master's thesis and is in public domain.
If you use this dataset, please cite:@masterthesis{mcspeech, title={Analiza porównawcza korpusów nagrań mowy dla celów syntezy mowy w języku polskim}, author={Czyżnikiewicz, Mateusz}, year={2022}, month={December}, school={Warsaw University of Technology}, type={Master's thesis}, doi={10.13140/RG.2.2.26293.24800}, note={Available at \url{http://dx.doi.org/10.13140/RG.2.2.26293.24800}}, }
More info about the dataset can be found at https://github.com/czyzi0/the-mc-speech-dataset