Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Tashkīl  



1.1  Harakat (short vowel marks)  



1.1.1  Fatah  





1.1.2  Kasrah  





1.1.3  ammah  





1.1.4  Alif Khanjariyah  







1.2  Maddah  





1.3  Alif waslah  





1.4  Sukūn  





1.5  Tanwin  





1.6  Shaddah  







2 Ijām  





3 Hamza  





4 Diacritics not used in Modern Standard Arabic  



4.1  Rohingya tone markers  







5 History  



5.1  Abu al-Aswad's system  





5.2  Al Farahidi's system  







6 Automatic diacritization  





7 See also  





8 References  














Arabic diacritics






العربية
Azərbaycanca
Bosanski
Чӑвашла
Deutsch
Español
فارسی
Français
Galego
Հայերեն
Bahasa Indonesia
Italiano
עברית
Қазақша
Kurdî
مصرى
Bahasa Melayu

پنجابی
Polski
Русский
سنڌي
Slovenčina

Türkçe
Українська
اردو
 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 




In other projects  



Wikimedia Commons
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Early written Arabic used only rasm (in black). Later, i‘jām (in red) were added so that letters, such as ṣād (ص) and ḍād (ض), could be distinguished. Ḥarakāt (in blue)—which is used in the Qur'an but not in most written Arabic—indicate short vowels, long consonants, and some other vocalizations.

The Arabic script has numerous diacritics, which include consonant pointing known as iʻjām (إِعْجَام), and supplementary diacritics known as tashkīl (تَشْكِيل). The latter include the vowel marks termed ḥarakāt (حَرَكَات; sg. حَرَكَة, ḥarakah).

The Arabic script is a modified abjad, where short consonants and long vowels are represented by letters but short vowels and consonant length are not generally indicated in writing. Tashkīl is optional to represent missing vowels and consonant length. Modern Arabic is always written with the i‘jām—consonant pointing, but only religious texts, children's books and works for learners are written with the full tashkīl—vowel guides and consonant length. It is however not uncommon for authors to add diacritics to a word or letter when the grammatical case or the meaning is deemed otherwise ambiguous. In addition, classical works and historic documents rendered to the general public are often rendered with the full tashkīl, to compensate for the gap in understanding resulting from stylistic changes over the centuries.

Tashkīl[edit]

The literal meaning of تَشْكِيل tashkīl is 'variation'. As the normal Arabic text does not provide enough information about the correct pronunciation, the main purpose of tashkīl (and ḥarakāt) is to provide a phonetic guide or a phonetic aid; i.e. show the correct pronunciation for children who are learning to read or foreign learners.

The bulk of Arabic script is written without ḥarakāt (or short vowels). However, they are commonly used in texts that demand strict adherence to exact pronunciation. This is true, primarily, of the Qur'an ٱلْقُرْآن (al-Qurʾān) and poetry. It is also quite common to add ḥarakāttohadiths ٱلْحَدِيث (al-ḥadīth; plural: al-ḥādīth) and the Bible. Another use is in children's literature. Moreover, ḥarakāt are used in ordinary texts in individual words when an ambiguity of pronunciation cannot easily be resolved from context alone. Arabic dictionaries with vowel marks provide information about the correct pronunciation to both native and foreign Arabic speakers. In art and calligraphy, ḥarakāt might be used simply because their writing is considered aesthetically pleasing.

An example of a fully vocalised (vowelisedorvowelled) Arabic from the Bismillah:

بِسْمِ ٱللَّٰهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
bism Allāh al-Raḥmān al-Raḥīm
In the name of God, the All-Merciful, the Especially-Merciful.

Some Arabic textbooks for foreigners now use ḥarakāt as a phonetic guide to make learning reading Arabic easier. The other method used in textbooks is phonetic romanisation of unvocalised texts. Fully vocalised Arabic texts (i.e. Arabic texts with ḥarakāt/diacritics) are sought after by learners of Arabic. Some online bilingual dictionaries also provide ḥarakāt as a phonetic guide similarly to English dictionaries providing transcription.

Harakat (short vowel marks)[edit]

The ḥarakāt حَرَكَات, which literally means 'motions', are the short vowel marks. There is some ambiguity as to which tashkīl are also ḥarakāt; the tanwīn, for example, are markers for both vowels and consonants.

Fatḥah[edit]

ـَ

The fatḥah فَتْحَة is a small diagonal line placed above a letter, and represents a short /a/ (like the /a/ sound in the English word "cat"). The word fatḥah itself (فَتْحَة) means opening and refers to the opening of the mouth when producing an /a/. For example, with dāl (henceforth, the base consonant in the following examples): دَ /da/.

When a fatḥah is placed before a plain letter ا (alif) (i.e. one having no hamza or vowel of its own), it represents a long /aː/ (close to the sound of "a" in the English word "dad", with an open front vowel /æː/, not back /ɑː/ as in "father"). For example: دَا /daː/. The fatḥah is not usually written in such cases. When a fathah is placed before the letter ⟨⟩ (yā’), it creates an /aj/ (as in "lie"); and when placed before the letter ⟨و⟩ (wāw), it creates an /aw/ (as in "cow").

Although paired with a plain letter creates an open front vowel (/a/), often realized as near-open (/æ/), the standard also allows for variations, especially under certain surrounding conditions. Usually, in order to have the more central (/ä/) or back (/ɑ/) pronunciation, the word features a nearby back consonant, such as the emphatics, as well as qāf, or rā’. A similar "back" quality is undergone by other vowels as well in the presence of such consonants, however not as drastically realized as in the case of fatḥah.[1][2][3]

Kasrah[edit]

ـِ

A similar diagonal line below a letter is called a kasrah كَسْرَة and designates a short /i/ (as in "me", "be") and its allophones [i, ɪ, e, e̞, ɛ] (as in "Tim", "sit"). For example: دِ /di/.[4]

When a kasrah is placed before a plain letter (yā’), it represents a long /iː/ (as in the English word "steed"). For example: دِي /diː/. The kasrah is usually not written in such cases, but if yā’ is pronounced as a diphthong /aj/, fatḥah should be written on the preceding consonant to avoid mispronunciation. The word kasrah means 'breaking'.[1]

Ḍammah[edit]

ـُ

The ḍammah ضَمَّة is a small curl-like diacritic placed above a letter to represent a short /u/ (as in "duke", shorter "you") and its allophones [u, ʊ, o, o̞, ɔ] (as in "put", or "bull"). For example: دُ /du/.[4]

When a ḍammah is placed before a plain letter و (wāw), it represents a long /uː/ (like the 'oo' sound in the English word "swoop"). For example: دُو /duː/. The ḍammah is usually not written in such cases, but if wāw is pronounced as a diphthong /aw/, fatḥah should be written on the preceding consonant to avoid mispronunciation.[1]

The word ḍammah (ضَمَّة) in this context means rounding, since it is the only rounded vowel in the vowel inventory of Arabic.

Alif Khanjariyah[edit]

ــٰ

The superscript (or dagger) alif أَلِف خَنْجَرِيَّة (alif khanjarīyah), is written as short vertical stroke on top of a consonant. It indicates a long /aː/ sound for which alif is normally not written. For example: هَٰذَا (hādhā) or رَحْمَٰن (raḥmān).

The dagger alif occurs in only a few words, but they include some common ones; it is seldom written, however, even in fully vocalised texts. Most keyboards do not have dagger alif. The word Allah الله (Allāh) is usually produced automatically by entering alif lām lām hāʾ. The word consists of alif + ligature of doubled lām with a shaddah and a dagger alif above lām, followed by ha'.

Maddah[edit]

ـٓ

آ

The maddah مَدَّة is a tilde-shaped diacritic, which can only appear on top of an alif (آ) and indicates a glottal stop /ʔ/ followed by a long /aː/.

In theory, the same sequence /ʔaː/ could also be represented by two alifs, as in *أَا, where a hamza above the first alif represents the /ʔ/ while the second alif represents the /aː/. However, consecutive alifs are never used in the Arabic orthography. Instead, this sequence must always be written as a single alif with a maddah above it, the combination known as an alif maddah. For example: قُرْآن /qurˈʔaːn/.

Alif waslah[edit]

ٱ

The waṣlah وَصْلَة, alif waṣlah أَلِف وَصْلَةorhamzat waṣl هَمْزَة وَصْل looks like a small letter ṣād on top of an alif ٱ (also indicated by an alif ا without a hamzah). It means that the alif is not pronounced when its word does not begin a sentence. For example: بِٱسْمِ (bismi), but ٱمْشُوا۟ (imshū not mshū). This is because no Arabic word can start with a vowel-less consonant: If the second letter from the waṣlah has a kasrah, the alif-waslah makes the sound /i/. However, when the second letter from it has a dammah, it makes the sound /u/.

It occurs only in the beginning of words, but it can occur after prepositions and the definite article. It is commonly found in imperative verbs, the perfective aspect of verb stems VII to X and their verbal nouns (maṣdar). The alif of the definite article is considered a waṣlah.

It occurs in phrases and sentences (connected speech, not isolated/dictionary forms):

Like the superscript alif, it is not written in fully vocalized scripts, except for sacred texts, like the Quran and Arabized Bible.

Sukūn[edit]

ـْـ

The sukūn سُكُونْ is a circle-shaped diacritic placed above a letter ( ْ). It indicates that the consonant to which it is attached is not followed by a vowel, i.e., zero-vowel.

It is a necessary symbol for writing consonant-vowel-consonant syllables, which are very common in Arabic. For example: دَدْ (dad).

The sukūn may also be used to help represent a diphthong. A fatḥah followed by the letter (yā’) with a sukūn over it (ـَيْ) indicates the diphthong ay (IPA /aj/). A fatḥah, followed by the letter (wāw) with a sukūn, (ـَوْ) indicates /aw/.

ـۡـ

The sukūn may have also an alternative form of the small high head of ḥāʾ (U+06E1 ۡ ARABIC SMALL HIGH DOTLESS HEAD OF KHAH), particularly in some Qurans. Other shapes may exist as well (for example, like a small comma above ⟨ʼ⟩ or like a circumflex ⟨ˆ⟩ in nastaʿlīq).[5]

Tanwin[edit]

ـٌ

ـٍ

ـً

The three vowel diacritics may be doubled at the end of a word to indicate that the vowel is followed by the consonant n. They may or may not be considered ḥarakāt and are known as tanwīn تَنْوِين, or nunation. The signs indicate, from left to right, -un, -in, -an.

These endings are used as non-pausal grammatical indefinite case endings in Literary Arabicorclassical Arabic (triptotes only). In a vocalised text, they may be written even if they are not pronounced (see pausa). See i‘rāb for more details. In many spoken Arabic dialects, the endings are absent. Many Arabic textbooks introduce standard Arabic without these endings. The grammatical endings may not be written in some vocalized Arabic texts, as knowledge of i‘rāb varies from country to country, and there is a trend towards simplifying Arabic grammar.

The sign ـً is most commonly written in combination with ـًا (alif), ةً (tā’ marbūṭah), أً (alif hamzah) or stand-alone ءً (hamzah). Alif should always be written (except for words ending in tā’ marbūṭah, hamzah or diptotes) even if an is not. Grammatical cases and tanwīn endings in indefinite triptote forms:

Shaddah[edit]

ـّـ

The shaddaorshaddah شَدَّة (shaddah), or tashdid تَشْدِيد (tashdīd), is a diacritic shaped like a small written Latin "w".

It is used to indicate gemination (consonant doubling or extra length), which is phonemic in Arabic. It is written above the consonant which is to be doubled. It is the only ḥarakah that is commonly used in ordinary spelling to avoid ambiguity. For example: دّ /dd/; madrasah مَدْرَسَة ('school') vs. mudarrisah مُدَرِّسَة ('teacher', female).

I‘jām[edit]

7th-century kufic script without any ḥarakātori‘jām.

The i‘jām (إِعْجَام; sometimes also called nuqaṭ)[6] are the diacritic points that distinguish various consonants that have the same form (rasm), such as ص /sˤ/, ض /dˤ/. Typically i‘jām are not considered diacritics but part of the letter.

Early manuscripts of the Quran did not use diacritics either for vowels or to distinguish the different values of the rasm. Vowel pointing was introduced first, as a red dot placed above, below, or beside the rasm, and later consonant pointing was introduced, as thin, short black single or multiple dashes placed above or below the rasm. These i‘jām became black dots about the same time as the ḥarakāt became small black letters or strokes.

Typically, Egyptians do not use dots under final yā’ (ي), which looks exactly like alif maqṣūrah (ى) in handwriting and in print. This practice is also used in copies of the muṣḥaf (Qurʾān) scribed by ‘Uthman Ṭāhā. The same unification of and alif maqṣūrā has happened in Persian, resulting in what the Unicode Standard calls "Arabic Letter Farsi Yeh", that looks exactly the same as in initial and medial forms, but exactly the same as alif maqṣūrah in final and isolated forms.

Isolated kāf with ‘alāmātu-l-ihmāl and without top stroke next to initial kāf with top stroke.

سۡ سۜ سۣ سٚ ڛ

At the time when the i‘jām was optional, unpointed letters were ambiguous. To clarify that a letter would lack i‘jām in pointed text, the letter could be marked with a small v- or seagull-shaped diacritic above, also a superscript semicircle (crescent), a subscript dot (except in the case of ح; three dots were used with س), or a subscript miniature of the letter itself. A superscript stroke known as jarrah, resembling a long fatħah, was used for a contracted (assimilated) sin. Thus ڛ سۣ سۡ سٚ were all used to indicate that the letter in question was truly س and not ش.[7] These signs, collectively known as ‘alāmātu-l-ihmāl, are still occasionally used in modern Arabic calligraphy, either for their original purpose (i.e. marking letters without i‘jām), or often as purely decorative space-fillers. The small ک above the kāf in its final and isolated forms ك  ـك was originally an ‘alāmatu-l-ihmāl that became a permanent part of the letter. Previously this sign could also appear above the medial form of kāf, when that letter was written without the stroke on its ascender. When kaf was written without that stroke, it could be mistaken for lam, thus kaf was distinguished with a superscript kaf or a small superscript hamza (nabrah), and lam with a superscript l-a-m (lam-alif-mim).[8]

Hamza[edit]

ئ  ؤ  إ  أ ء

Although normally it is sometimes not considered a letter of the alphabet, the hamza هَمْزة (hamzah, glottal stop), often stands as a separate letter in writing, is written in unpointed texts and is not considered a tashkīl. It may appear as a letter by itself or as a diacritic over or under an alif, wāw, or .

Which letter is to be used to support the hamzah depends on the quality of the adjacent vowels;

Consider the following words: أَخ /ʔax/ ("brother"), إسْماعِيل /ʔismaːʕiːl/ ("Ismael"), أُمّ /ʔumm/ ("mother"). All three of above words "begin" with a vowel opening the syllable, and in each case, alif is used to designate the initial glottal stop (the actual beginning). But if we consider middle syllables "beginning" with a vowel: نَشْأة /naʃʔa/ ("origin"), أَفْئِدة /ʔafʔida/ ("hearts"—notice the /ʔi/ syllable; singular فُؤاد /fuʔaːd/), رُؤُوس /ruʔuːs/ ("heads", singular رَأْس /raʔs/), the situation is different, as noted above. See the comprehensive article on hamzah for more details.

Diacritics not used in Modern Standard Arabic[edit]

Diacritics not used in Modern Standard Arabic but in other languages that use the Arabic script, and sometimes to write Arabic dialects, include (the list is not exhaustive):

Description Unicode Example Language(s) Notes
Bars and lines
diagonal bar above گ Arabic (Iraq), Balti, Burushaski,
Kashmiri, Kazakh,
Khowar, Kurdish,
Kyrgyz, Persian,
Sindhi, Urdu,
Uyghur
  • Diagonal bar above kaf to create gaf: گ (IPA g)
  • When writing Arabic, often used in Iraq to represent the sound /ɡ/. Often used in Iraq to represent the /g/ sound to write foreign words in Arabic script, while in Morocco the variant ݣ is seen.[9]
horizontal bar above Pashto
vertical line above ئۈ Uyghur
  • the letter ئۈ (IPA /y/) contains a vertical line above the vav
Dots
2 dots (vertical) ݭ ݙ
4 dots ڐ‎ ٿ ڐ ڙ Sindhi, Old Hindustani
dot below U+065C ٜ ARABIC VOWEL SIGN DOT BELOW ٜ   بٜ African languages[10]
  • also used in Quranic text in African and other orthographies[10]
Variants of standard Arabic diacritics
wavy hamza ٲ اٟ Kashmiri
  • The Kashmiri language written in Arabic script includes the diacritic or "wavy hamza".
  • In Kashmiri the diacritic is called āmālü mad when used above alif: ٲ to create the vowel /əː/.[11]
  • Kashmiri calls the wavy hamza sāȳ when below the alif: اٟ to create the sound /ɨː/.[12]
curly kasra above ◌ࣥ Rohingya
  • Latin "ou"
Rohingya
  • Latin "oñ"
double kasra above ◌ࣱ Rohingya
  • Latin "uñ"
inverted and regular curly kasras above ◌ࣨ Rohingya
  • Latin "ouñ"
Tildes
diagonal tilde shape above ◌ࣤ Rohingya
  • Latin "o"
diagonal tilde shape below ◌ࣦ Rohingya
  • Latin "e"
Arabic letters
miniature Arabic letter hah (initial form) ﺣ above ◌ۡ Rohingya
  • Sukun (zero-vowel)
miniature Arabic letter tah ط above ݲ Urdu
Eastern Arabic numerals[13]
Eastern Arabic numeral 2: ٢ above U+0775, U+0778, U+077A ݵ ݸ ݺ Burushaski
  • Present in the Burushaski letters ݸ‎ and ݺ
Eastern Arabic numeral 3: ٣ above U+0776, U+0779, U+077B ݶ ݹ ݻ Burushaski
  • Present in the Burushaski letters ݶ‎, ݹ‎ and ݻ
Urdu number 4: ۴ above or below U+0777, U+077C, U+077D ݷ ݼ ݽ Burushaski
  • Present in the Burushaski letters ݼ‎ and ݽ
Shapes like Latin letters
Nūn ġuṇnā, "u" shape above ن٘ Urdu
  • Vowel nasalization is represented by nun ghunna, which in medial form is written as nun with the diacritic maghnoona (also called ulta jazm, Unicode U+0658) above: ن٘.
"v" shape above ۆ   ؤیٛ Azerbaijani
  • used only on top of vav: ۆ equivalent to Latin ü, Cyrillic ү, IPA //y//
invered "v" shape above ئۆ  Azerbaijani, Uyghur
  • in Azerbaijani, used only on top of ye: یٛ is equivalent to Latin ı, Cyrillic ы, IPA //ɯ//
  • in Uyghur, the letter ئۆ (IPA /ø/) contains the v shape above the vav
dotted fatha ◌ࣵ Wolof Latin à
circle with fatha ◌ࣴ‎ Wolof Latin ë
less than sign - below ◌ࣹ‎ Wolof Latin e
greater than sign - below ◌ࣺ‎ Wolof Latin é
less than sign - above ◌ࣷ‎ Wolof Latin o
greater than sign - above ◌ࣸ‎ Wolof Latin ó
ring ګ Pashto
  • kaf with ring (ګ) is used for IPA /ɡ/
Other shapes
"fish" shape above دࣤ࣬  دࣥ࣬  دࣦ࣯ Rohingya Ṭāna, e.g. دࣤ࣬ / دࣥ࣬ / دࣦ࣯‎ written above or below other diacritics to mark a long rising tone (/˨˦/).[14][15]
Various Urdu
  • Special diacritics usually found only in dictionaries for clarification of irregular pronunciation include kasrah-e-majhool, fathah-e-majhool, dammah-e-majhool, and alif-e-wavi.[16]

Rohingya tone markers[edit]

Historically Arabic script has been adopted and used by many tonal languages, examples include Xiao'erjing for Mandarin Chinese as well as Ajami script adopted for writing various languages of Western Africa. However, the Arabic script never had an inherent way of representing tones until it was adapted for the Rohingya language. The Rohingya Fonna are 3 tone markers which are part of the standardized and accepted orthographic convention of Rohingya. It remains the only known instance of tone markers within the Arabic script.[14][15]

Tone markers act as "modifiers" of vowel diacritics. In simpler words, they are "diacritics for the diacritics". They are written "outside" of the word, meaning that they are written above the vowel diacritic if the diacritic is written above the word, and they are written below the diacritic if the diacritic is written below the word. They are only ever written where there are vowel diacritics. This is important to note, as without the diacritic present, there is no way to distinguish between tone markers and I‘jām i.e. dots that are used for purpose of phonetic distinctions of consonants.

Hārbāy

◌࣪ / ◌࣭

The Hārbāy as it is called in Rohingya, is a single dot that's placed on top of Fatḥah and Ḍammah, or curly Fatḥah and curly Ḍammah (vowel diacritics unique to Rohinghya), or their respective Fatḥatan and Ḍammatan versions, and it's placed underneath Kasrahorcurly Kasrah, or their respective Kasratan version. (e.g. دً࣪ / دٌ࣪ / دࣨ࣪ / دٍ࣭‎) This tone marker indicates a short high tone (/˥/).[14][15]

Ṭelā

◌࣫ / ◌࣮

The Ṭelā as it is called in Rohingya, is two dots that are placed on top of Fatḥah and Ḍammah, or curly Fatḥah and curly Ḍammah, or their respective Fatḥatan and Ḍammatan versions, and it's placed underneath Kasrahorcurly Kasrah, or their respective Kasratan version. (e.g. دَ࣫ / دُ࣫ / دِ࣮‎) This tone marker indicates a long falling tone (/˥˩/).[14][15]

Ṭāna

◌࣬ / ◌࣯

The Ṭāna as it is called in Rohingya, is a fish-like looping line that is placed on top of Fatḥah and Ḍammah, or curly Fatḥah and curly Ḍammah, or their respective Fatḥatan and Ḍammatan versions, and it's placed underneath Kasrahorcurly Kasrah, or their respective Kasratan version. (e.g. دࣤ࣬ / دࣥ࣬ / دࣦ࣯‎) This tone marker indicates a long rising tone (/˨˦/).[14][15]

History[edit]

Evolution of early Arabic calligraphy (9th–11th century). The basmala was taken as an example, from Kufic Qur'an manuscripts.
(1) Early 9th century, script with no dots or diacritic marks (see image of early Basmala Kufic);
(2) and (3) 9th–10th century under Abbasid dynasty, Abu al-Aswad's system established red dots with each arrangement or position indicating a different short vowel; later, a second black-dot system was used to differentiate between letters like fā’ and qāf;
(4) 11th century, in al-Farāhídi's system (system we know today) dots were changed into shapes resembling the letters to transcribe the corresponding long vowels.

According to tradition, the first to commission a system of harakat was Ali who appointed Abu al-Aswad al-Du'ali for the task. Abu al-Aswad devised a system of dots to signal the three short vowels (along with their respective allophones) of Arabic. This system of dots predates the i‘jām, dots used to distinguish between different consonants.

Abu al-Aswad's system[edit]

Abu al-Aswad's system of Harakat was different from the system we know today. The system used red dots with each arrangement or position indicating a different short vowel.

A dot above a letter indicated the vowel a, a dot below indicated the vowel i, a dot on the side of a letter stood for the vowel u, and two dots stood for the tanwīn.

However, the early manuscripts of the Qur'an did not use the vowel signs for every letter requiring them, but only for letters where they were necessary for a correct reading.

Al Farahidi's system[edit]

The precursor to the system we know today is Al Farahidi's system. al-Farāhīdī found that the task of writing using two different colours was tedious and impractical. Another complication was that the i‘jām had been introduced by then, which, while they were short strokes rather than the round dots seen today, meant that without a color distinction the two could become confused.

Accordingly, he replaced the ḥarakāt with small superscript letters: small alif, yā’, and wāw for the short vowels corresponding to the long vowels written with those letters, a small s(h)īn for shaddah (geminate), a small khā’ for khafīf (short consonant; no longer used). His system is essentially the one we know today.[17]

Automatic diacritization[edit]

The process of automatically restoring diacritical marks is called diacritization or diacritic restoration. It is useful to avoid ambiguity in applications such as Arabic machine translation, text-to-speech, and information retrieval. Automatic diacritization algorithms have been developed.[18][19] For Modern Standard Arabic, the state-of-the-art algorithm has a word error rate (WER) of 4.79%. The most common mistakes are proper nouns and case endings.[20] Similar algorithms exist for other varieties of Arabic.[21]

See also[edit]

References[edit]

  1. ^ a b c Karin C. Ryding, "A Reference Grammar of Modern Standard Arabic", Cambridge University Press, 2005, pgs. 25-34, specifically “Chapter 2, Section 4: Vowels”
  • ^ Anatole Lyovin, Brett Kessler, William Ronald Leben, "An Introduction to the Languages of the World", "5.6 Sketch of Modern Standard Arabic", Oxford University Press, 2017, pg. 255, Edition 2, specifically “5.6.2.2 Vowels”
  • ^ Amine Bouchentouf, Arabic For Dummies®, John Wiley & Sons, 2018, 3rd Edition, specifically section "All About Vowels"
  • ^ a b "Introduction to Written Arabic". University of Victoria, Canada.
  • ^ "Arabic character notes". r12a.
  • ^ Ibn Warraq (2002). Ibn Warraq (ed.). What the Koran Really Says : Language, Text & Commentary. Translated by Ibn Warraq. New York: Prometheus. p. 64. ISBN 1-57392-945-X. Archived from the original on 11 April 2019. Retrieved 9 April 2019.
  • ^ Gacek, Adam (2009). "Unpointed letters". Arabic Manuscripts: A Vademecum for Readers. BRILL. p. 286. ISBN 978-90-04-17036-0.
  • ^ Gacek, Adam (1989). "Technical Practices and Recommendations Recorded by Classical and Post-Classical Arabic Scholars Concerning the Copying and Correction of Manuscripts" (PDF). In Déroche, François (ed.). Les manuscrits du Moyen-Orient: essais de codicologie et de paléographie. Actes du colloque d'Istanbul (Istanbul 26–29 mai 1986). p. 57 (§ 8. Diacritical marks and vowelisation).
  • ^ Alkalesi, Yasin M. (2001) "Modern iraqi arabic: A textbook". Georgetown University Press. ISBN 978-0878407880
  • ^ a b "Arabic Range: 0600–06FF The Unicode Standard, Version 15.1" (PDF). Unicode. Retrieved 10 July 2024.
  • ^ "Vowel 04: ٲ / ä – (aae)". Kashmiri Dictionary. 31 January 2021. Retrieved 11 July 2024.
  • ^ "Vowel07: اٟ / ü ( ι )". Kashmiri Dictionary. 6 February 2021. Retrieved 11 July 2024.
  • ^ Mirza, Umair (2006). بروشسکی اردو لغت [Burushaski–Urdu Dictionary] (in Urdu and Burushaski). pp. 28–29. ISBN 969-404-66-0. Retrieved 13 July 2024.{{cite book}}: CS1 maint: ignored ISBN errors (link)
  • ^ a b c d e Priest, Lorna A.; Hosken, Martin (10 August 2010). "Proposal to add Arabic script characters for African and Asian languages" (PDF). The Unicode Consortium. Archived (PDF) from the original on 8 October 2022. Retrieved 5 May 2023.
  • ^ a b c d e Pandey, Anshuman (27 October 2015). "Proposal to encode the Hanifi Rohingya script in Unicode" (PDF). The Unicode Consortium. Archived (PDF) from the original on 12 December 2019. Retrieved 5 May 2023.
  • ^ "Proposal of Inclusion of Certain Characters in Unicode" (PDF).
  • ^ Versteegh, C. H. M. (1997). The Arabic Language. Columbia University Press. pp. 56ff. ISBN 978-0-231-11152-2.
  • ^ Azmi, Aqil M.; Almajed, Reham S. (2013-10-10). "A survey of automatic Arabic diacritization techniques". Natural Language Engineering. 21 (3): 477–495. doi:10.1017/S1351324913000284. ISSN 1351-3249. S2CID 31560671.
  • ^ Almanea, Manar (2021). "Automatic Methods and Neural Networks in Arabic Texts Diacritization: A Comprehensive Survey". IEEE Access. 9: 145012–145032. Bibcode:2021IEEEA...9n5012A. doi:10.1109/ACCESS.2021.3122977. ISSN 2169-3536. S2CID 240011970.
  • ^ Thompson, Brian; Alshehri, Ali (2021-09-28). "Improving Arabic Diacritization by Learning to Diacritize and Translate". arXiv:2109.14150 [cs.CL].
  • ^ Masmoudi, Abir; Aloulou, Chafik; Abdellahi, Abdel Ghader Sidi; Belguith, Lamia Hadrich (2021-08-08). "Automatic diacritization of Tunisian dialect text using SMT model". International Journal of Speech Technology. 25: 89–104. doi:10.1007/s10772-021-09864-6. ISSN 1572-8110. S2CID 238782966.

  • Retrieved from "https://en.wikipedia.org/w/index.php?title=Arabic_diacritics&oldid=1234377060"

    Categories: 
    Arabic diacritics
    Arabic words and phrases
    Quranic orthography
    Phonetic guides
    Hidden categories: 
    CS1 maint: ignored ISBN errors
    CS1 uses Urdu-language script (ur)
    CS1 Urdu-language sources (ur)
    CS1 foreign language sources (ISO 639-2)
    Articles with short description
    Short description is different from Wikidata
    Wikipedia articles with style issues from June 2024
    All articles with style issues
    Articles containing Arabic-language text
    Pages with plain IPA
    Articles needing additional references from April 2023
    All articles needing additional references
    Articles containing Burushaski-language text
    Articles containing Urdu-language text
    Articles containing Bengali-language text
    Articles containing Odia-language text
    Articles containing Malayalam-language text
    Articles containing Telugu-language text
    Articles containing Tigrinya-language text
    Articles containing Kannada-language text
    Articles containing Sinhala-language text
    Articles containing Burmese-language text
    Articles containing Tamil-language text
    Articles containing Japanese-language text
     



    This page was last edited on 14 July 2024, at 01:43 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki