Quran Speech to Text Dataset

The dataset is structured as the following:

audio_data/
  reader_1/
     001001.mp3 (surah 1, ayah 1)
     001002.mp3
     ...
  reader_2/
     001001.mp3
    ...
      114006.mp3 (surah 114, ayah 6)
   reader_3/
...


Note, that not all readers have all the 6236 ayat of Quran, some may not even have all the 114 surahs.

The text of the surahs is in the all_ayat.json file. all_ayat.json file has all the surahs and ayas in Arabic text.
json key format is "1_1" for surah 1 ayah 1, or "114_2" (surah 114 ayah 2). In other words, "xxx_yyy" where x is surah number and y is ayah number up to 3 digits long.
{"tafsir":{"1_1":{"text":"بِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ"},"1_2":{"text":"الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ"},"1_3":{"text":"الرَّحْمَٰنِ الرَّحِيمِ"},"1_4":{"text":"مَالِكِ يَوْمِ الدِّينِ"},"1_5":{"text":"إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ"},"1_6":{"text":"اهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ"}, ...}

Some extra machine learning input convenience notes:

audo_list.txt has a list of all mp3 files found in the audio_data directory, transcripts.tsv is a tab-separated-value file that can be used as an input to a machine learning program. It has the format Path-Duration(in seconds) -Arabic text.