Skip to content

Hetchy/Quranic-Phonemizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Qurʾānic Phonemizer

A custom phonemizer (Grapheme to Phoneme converter) for the Qurʾān in the Hafs riwaya, converting text to phoneme sequences with support for Tajweed rules.

Potential use cases:

  • Speech Recognition: Create training data for speech recognition and machine learning systems
  • Text-to-Speech: Develop accurate TTS systems for Qurʾānic Arabic
  • Linguistic Analysis: Study phonological patterns and Tajweed rule distributions across the Qurʾān
  • Educational Tools: Build interactive applications for teaching pronunciation and Tajweed

In addition to the Python API, the phonemizer can be used interactively: quranicphonemizer.com.

Table of Contents

Phoneme Inventory

The phoneme inventory uses the standard International Phonetic Alphabet (IPA) Arabic phonemes alongside custom phonemes for Tajweed rules. There is a total of 72 phonemes, corresponding to:

  • 28 consonants
  • 24 geminated consonants
  • 8 vowels
  • 12 Tajweed phonemes

All phonemes are configurable in resources/base_phonemes.yaml and resources/rule_phonemes.yaml.

Consonants

Letter Phoneme Letter Phoneme Letter Phoneme Letter Phoneme
أ , إ , ء , ؤ , ئ ʔ د d / dd ض / dˤdˤ ك k / kk
ب b / bb ذ ð / ðð ط / tˤtˤ ل l / ll / lˤlˤ
ت t / tt ر r / / rr / rˤrˤ ظ ðˤ / ðˤðˤ م m
ث θ / θθ ز z / zz ع ʕ / ʕʕ ن n
ج ʒ / ʒʒ س s / ss غ ɣ هـ h / hh
ح ħ / ħħ ش ʃ / ʃʃ ف f / ff و w / ww
خ x / xx ص / sˤsˤ ق q / qq ي , ى j / jj

Gemination (shaddah) is represented by repeating the phoneme to create new distinct phonemes. Note that there is no gemination for m / n (modelled as Tajweed instead), and for ʔ / ɣ (do not exist in the Qurʾān).

Vowels

Vowel Phoneme
َ a /
ُ u
ِ i
ا , ى a: / aˤ:
و u:
ي , ى i:

Tajweed Rules

Rule Phoneme
Iqlab
Idgham ñ / / /
Ikhfaa ŋ (Light / Shafawi)
ŋˤ (Heavy)
Qalqala Q (Sughra)
QQ (Kubra)
Tafkheem lˤlˤ (Lam in "Allah")
/ rˤrˤ (Raa)

Usage

Installation

git clone https://github.com/Hetchy/Quranic-Phonemizer.git
cd phonemizer
pip install -r requirements.txt

Quick Start

from core.phonemizer import Phonemizer

pm = Phonemizer()
res = pm.phonemize("1:1")
print(res.text())
print(res.phonemes_str())

بِسْمِ ٱللَّهِ ٱلرَّحْمَـٰنِ ٱلرَّحِيمِ (١)

bismi lla:hi rˤrˤaˤħma:ni rˤrˤaˤħi:m

Input References

phonemize() accepts a variety of flexible formats to specify which part of the Qurʾān to phonemize:

Format Example Meaning
"1" Entire chapter 1
"1:1" Verse 1 of chapter 1
"1:1:1" Word 1 of verse 1 of chapter 1
"1:1 - 1:4" Verse range: 1:1 through 1:4
"1:1 - 1:2:2" From 1:1 to word 2 of 1:2
"1 - 2:2" From entire chapter 1 through verse 2 of chapter 2

Outputs

phonemize() returns a PhonemizeResult object, containing:

Attribute Description
ref The original reference string
text() The Qurʾānic text
phonemes_list(split) Phoneme lists grouped by split: "word", "verse", or "both"
phonemes_str() Full phoneme string, configurable with separators
show_table(split) Pandas DataFrame view, grouped by split
save(path, fmt, split) Save results to JSON or CSV

Output Example (Phonemes String)

res = pm.phonemize("112", stops=["verse"])
print(res.text())
print(res.phonemes_str(phoneme_sep=" ", word_sep=" | ", verse_sep="\n"))

قُلْ هُوَ ٱللَّهُ أَحَدٌ (١) ٱللَّهُ ٱلصَّمَدُ (٢) لَمْ يَلِدْ وَلَمْ يُولَدْ (٣) وَلَمْ يَكُن لَّهُۥ كُفُوًا أَحَدٌۢ (٤)

q u l | h u w a | lˤlˤ aˤ: h u | ʔ a ħ a d QQ

ʔ a lˤlˤ aˤ: h u | sˤsˤ aˤ m a d QQ

l a m | j a l i d Q | w a l a m | j u: l a d QQ

w a l a m | j a k u | ll a h u: | k u f u w a n | ʔ a ħ a d QQ

Output Example (Table View)

res = pm.phonemize("112", stops=["verse"])
df = res.show_table()
df
location word phonemes
0 112:1:1 قُلْ qul
1 112:1:2 هُوَ huwa
2 112:1:3 ٱللَّهُ lˤlˤaˤ:hu
3 112:1:4 أَحَدٌ ʔaħadQQ
4 112:2:1 ٱللَّهُ ʔalˤlˤaˤ:hu
5 112:2:2 ٱلصَّمَدُ sˤsˤaˤmadQQ
6 112:3:1 لَمْ lam
7 112:3:2 يَلِدْ jalidQ
8 112:3:3 وَلَمْ walam
9 112:3:4 يُولَدْ ju:ladQQ
10 112:4:1 وَلَمْ walam
11 112:4:2 يَكُن jaku
12 112:4:3 لَّهُۥ llahu:
13 112:4:4 كُفُوًا kufuwan
14 112:4:5 أَحَدٌۢ ʔaħadQQ

Stops (Boundary Markers)

Optionally, pass a stops=[] list to force word/verse segmentation:

Stop key Symbol
"verse" ۝
"preferred_continue" ۖ
"preferred_stop" ۗ
"optional_stop" ۚ
"compulsory_stop" ۘ
"prohibited_stop" ۙ
ref = "68:33"
res = pm.phonemize(ref)
print(res.text())
print(res.phonemes_str())

res = pm.phonemize(ref, stops=["preferred_continue"])
print(res.phonemes_str())

res = pm.phonemize(ref, stops=["optional_stop"])
print(res.phonemes_str())

كَذٰلِكَ ٱلۡعَذَابُ‌ۖ وَلَعَذَابُ ٱلۡأَخِرَةِ أَكۡبَرُ‌ۚ لَوۡ كَانُواۡ يَعۡلَمُونَ ‏﴿٣٣﴾‏

kaða:lika lʕaða:bu walaʕaða:bu lʔa:xirˤaˤti ʔakbarˤu law ka:nu: jaʕlamu:n

kaða:lika lʕaða:bQQ walaʕaða:bu lʔa:xirˤaˤti ʔakbarˤu law ka:nu: jaʕlamu:n

kaða:lika lʕaða:bu walaʕaða:bu lʔa:xirˤaˤti ʔakba law ka:nu: jaʕlamu:n

ref = "44:43 - 44:44"
res = pm.phonemize(ref, stops=["verse"])
print(res.text())
print(res.phonemes_str(phoneme_sep="", word_sep=" ", verse_sep=""))

res = pm.phonemize(ref, stops=[])
print(res.phonemes_str(phoneme_sep="", word_sep=" ", verse_sep=""))

إِنَّ شَجَرَتَ ٱلزَّقُّومِ (٤٣) طَعَامُ ٱلْأَثِيمِ (٤٤)

ʔiña ʃaʒarˤaˤta zzaqqu:m tˤaˤʕa:mu lʔaθi:m

ʔiña ʃaʒarˤaˤta zzaqqu:mi tˤaˤʕa:mu lʔaθi:m

Contributing

If you find any issues or have feature suggestions, please feel free to email quranicphonemizer@gmail.com, open an issue or submit a pull request.

Particularly, support for other qira'at/riwayat would be very useful.

Credits

The project makes use of the Quranic Universal Library's (QUL) Hafs script.

Citing

If you use this phonemizer in your work, please cite it as follows:

@misc{ibrahim2025quranicphonemizer,
  author = {Ahmed Ibrahim},
  title = {Quranic Phonemizer},
  year = {2025},
  howpublished = {\url{https://github.com/Hetchy/Quranic-Phonemizer}},
}

Releases

No releases published

Packages

No packages published