A custom phonemizer (Grapheme to Phoneme converter) for the Qurʾān in the Hafs riwaya, converting text to phoneme sequences with support for Tajweed rules.
Potential use cases:
- Speech Recognition: Create training data for speech recognition and machine learning systems
- Text-to-Speech: Develop accurate TTS systems for Qurʾānic Arabic
- Linguistic Analysis: Study phonological patterns and Tajweed rule distributions across the Qurʾān
- Educational Tools: Build interactive applications for teaching pronunciation and Tajweed
In addition to the Python API, the phonemizer can be used interactively: quranicphonemizer.com.
- Phoneme Inventory
- Usage
- Input References
- Outputs
- Stops (Boundary Markers)
- Contributing
- Credits
- Citing
The phoneme inventory uses the standard International Phonetic Alphabet (IPA) Arabic phonemes alongside custom phonemes for Tajweed rules. There is a total of 72 phonemes, corresponding to:
- 28 consonants
- 24 geminated consonants
- 8 vowels
- 12 Tajweed phonemes
All phonemes are configurable in resources/base_phonemes.yaml and resources/rule_phonemes.yaml.
Letter | Phoneme | Letter | Phoneme | Letter | Phoneme | Letter | Phoneme |
---|---|---|---|---|---|---|---|
أ , إ , ء , ؤ , ئ | ʔ |
د | d / dd |
ض | dˤ / dˤdˤ |
ك | k / kk |
ب | b / bb |
ذ | ð / ðð |
ط | tˤ / tˤtˤ |
ل | l / ll / lˤlˤ |
ت | t / tt |
ر | r / rˤ / rr / rˤrˤ |
ظ | ðˤ / ðˤðˤ |
م | m |
ث | θ / θθ |
ز | z / zz |
ع | ʕ / ʕʕ |
ن | n |
ج | ʒ / ʒʒ |
س | s / ss |
غ | ɣ |
هـ | h / hh |
ح | ħ / ħħ |
ش | ʃ / ʃʃ |
ف | f / ff |
و | w / ww |
خ | x / xx |
ص | sˤ / sˤsˤ |
ق | q / qq |
ي , ى | j / jj |
Gemination (shaddah) is represented by repeating the phoneme to create new distinct phonemes. Note that there is no gemination for m
/ n
(modelled as Tajweed instead), and for ʔ
/ ɣ
(do not exist in the Qurʾān).
Vowel | Phoneme |
---|---|
َ | a / aˤ |
ُ | u |
ِ | i |
ا , ى | a: / aˤ: |
و | u: |
ي , ى | i: |
Rule | Phoneme |
---|---|
Iqlab | m̃ |
Idgham | ñ / m̃ / j̃ / w̃ |
Ikhfaa | ŋ (Light / Shafawi)ŋˤ (Heavy) |
Qalqala | Q (Sughra)QQ (Kubra) |
Tafkheem | lˤlˤ (Lam in "Allah")rˤ / rˤrˤ (Raa) |
git clone https://github.com/Hetchy/Quranic-Phonemizer.git
cd phonemizer
pip install -r requirements.txt
from core.phonemizer import Phonemizer
pm = Phonemizer()
res = pm.phonemize("1:1")
print(res.text())
print(res.phonemes_str())
بِسْمِ ٱللَّهِ ٱلرَّحْمَـٰنِ ٱلرَّحِيمِ (١)
bismi lla:hi rˤrˤaˤħma:ni rˤrˤaˤħi:m
phonemize()
accepts a variety of flexible formats to specify which part of the Qurʾān to phonemize:
Format Example | Meaning |
---|---|
"1" |
Entire chapter 1 |
"1:1" |
Verse 1 of chapter 1 |
"1:1:1" |
Word 1 of verse 1 of chapter 1 |
"1:1 - 1:4" |
Verse range: 1:1 through 1:4 |
"1:1 - 1:2:2" |
From 1:1 to word 2 of 1:2 |
"1 - 2:2" |
From entire chapter 1 through verse 2 of chapter 2 |
phonemize()
returns a PhonemizeResult
object, containing:
Attribute | Description |
---|---|
ref |
The original reference string |
text() |
The Qurʾānic text |
phonemes_list(split) |
Phoneme lists grouped by split : "word" , "verse" , or "both" |
phonemes_str() |
Full phoneme string, configurable with separators |
show_table(split) |
Pandas DataFrame view, grouped by split |
save(path, fmt, split) |
Save results to JSON or CSV |
res = pm.phonemize("112", stops=["verse"])
print(res.text())
print(res.phonemes_str(phoneme_sep=" ", word_sep=" | ", verse_sep="\n"))
قُلْ هُوَ ٱللَّهُ أَحَدٌ (١) ٱللَّهُ ٱلصَّمَدُ (٢) لَمْ يَلِدْ وَلَمْ يُولَدْ (٣) وَلَمْ يَكُن لَّهُۥ كُفُوًا أَحَدٌۢ (٤)
q u l | h u w a | lˤlˤ aˤ: h u | ʔ a ħ a d QQ
ʔ a lˤlˤ aˤ: h u | sˤsˤ aˤ m a d QQ
l a m | j a l i d Q | w a l a m | j u: l a d QQ
w a l a m | j a k u | ll a h u: | k u f u w a n | ʔ a ħ a d QQ
res = pm.phonemize("112", stops=["verse"])
df = res.show_table()
df
location | word | phonemes | |
---|---|---|---|
0 | 112:1:1 | قُلْ | qul |
1 | 112:1:2 | هُوَ | huwa |
2 | 112:1:3 | ٱللَّهُ | lˤlˤaˤ:hu |
3 | 112:1:4 | أَحَدٌ | ʔaħadQQ |
4 | 112:2:1 | ٱللَّهُ | ʔalˤlˤaˤ:hu |
5 | 112:2:2 | ٱلصَّمَدُ | sˤsˤaˤmadQQ |
6 | 112:3:1 | لَمْ | lam |
7 | 112:3:2 | يَلِدْ | jalidQ |
8 | 112:3:3 | وَلَمْ | walam |
9 | 112:3:4 | يُولَدْ | ju:ladQQ |
10 | 112:4:1 | وَلَمْ | walam |
11 | 112:4:2 | يَكُن | jaku |
12 | 112:4:3 | لَّهُۥ | llahu: |
13 | 112:4:4 | كُفُوًا | kufuwan |
14 | 112:4:5 | أَحَدٌۢ | ʔaħadQQ |
Optionally, pass a stops=[]
list to force word/verse segmentation:
Stop key | Symbol |
---|---|
"verse" |
|
"preferred_continue" |
ۖ |
"preferred_stop" |
ۗ |
"optional_stop" |
ۚ |
"compulsory_stop" |
ۘ |
"prohibited_stop" |
ۙ |
ref = "68:33"
res = pm.phonemize(ref)
print(res.text())
print(res.phonemes_str())
res = pm.phonemize(ref, stops=["preferred_continue"])
print(res.phonemes_str())
res = pm.phonemize(ref, stops=["optional_stop"])
print(res.phonemes_str())
كَذٰلِكَ ٱلۡعَذَابُۖ وَلَعَذَابُ ٱلۡأَخِرَةِ أَكۡبَرُۚ لَوۡ كَانُواۡ يَعۡلَمُونَ ﴿٣٣﴾
kaða:lika lʕaða:bu
walaʕaða:bu lʔa:xirˤaˤti ʔakbarˤu
law ka:nu: jaʕlamu:n
kaða:lika lʕaða:bQQ
walaʕaða:bu lʔa:xirˤaˤti ʔakbarˤu
law ka:nu: jaʕlamu:n
kaða:lika lʕaða:bu
walaʕaða:bu lʔa:xirˤaˤti ʔakbarˤ
law ka:nu: jaʕlamu:n
ref = "44:43 - 44:44"
res = pm.phonemize(ref, stops=["verse"])
print(res.text())
print(res.phonemes_str(phoneme_sep="", word_sep=" ", verse_sep=""))
res = pm.phonemize(ref, stops=[])
print(res.phonemes_str(phoneme_sep="", word_sep=" ", verse_sep=""))
إِنَّ شَجَرَتَ ٱلزَّقُّومِ (٤٣) طَعَامُ ٱلْأَثِيمِ (٤٤)
ʔiña ʃaʒarˤaˤta zzaqqu:m
tˤaˤʕa:mu lʔaθi:m
ʔiña ʃaʒarˤaˤta zzaqqu:mi
tˤaˤʕa:mu lʔaθi:m
If you find any issues or have feature suggestions, please feel free to email quranicphonemizer@gmail.com, open an issue or submit a pull request.
Particularly, support for other qira'at/riwayat would be very useful.
The project makes use of the Quranic Universal Library's (QUL) Hafs script.
If you use this phonemizer in your work, please cite it as follows:
@misc{ibrahim2025quranicphonemizer,
author = {Ahmed Ibrahim},
title = {Quranic Phonemizer},
year = {2025},
howpublished = {\url{https://github.com/Hetchy/Quranic-Phonemizer}},
}