Light Diacritic Restoration to Disambiguate Homographs in Modern Arabic Texts

Aqil M Azmi,Hatim A Aboalsamh,Rehab M Alnefaie

doi:10.1145/3486675

Abstract

Diacritic restoration (also known as diacritization or vowelization) is the process of inserting the correct diacritical markings into a text. Modern Arabic is typically written without diacritics, e.g., newspapers. This lack of diacritical markings often causes ambiguity, and though natives are adept at resolving, there are times they may fail. Diacritic restoration is a classical problem in computer science. Still, as most of the works tackle the full (heavy) diacritization of text, we, however, are interested in diacritizing the text using a fewer number of diacritics. Studies have shown that a fully diacritized text is visually displeasing and slows down the reading. This article proposes a system to diacritize homographs using the least number of diacritics, thus the name “light.” There is a large class of words that fall under the homograph category, and we will be dealing with the class of words that share the spelling but not the meaning. With fewer diacritics, we do not expect any effect on reading speed, while eye strain is reduced. The system contains morphological analyzer and context similarities. The morphological analyzer is used to generate all word candidates for diacritics. Then, through a statistical approach and context similarities, we resolve the homographs. Experimentally, the system shows very promising results, and our best accuracy is 85.6%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Light Diacritic Restoration to Disambiguate Homographs in Modern Arabic Texts

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Dec 13, 2021
Citations: 1

Similar Papers

A Multitask Learning Approach for Diacritic Restoration
Sawsan Alqahtani ... Mona Diab
-
Sawsan Alqahtani, et. al.Sawsan Alqahtani ... Mona Diab
01 Jan 2020
01 Jan 2020

A survey of automatic Arabic diacritization techniques
Aqil M Azmi ... Reham S Almajed
Natural Language Engineering | VOL. 21
Aqil M Azmi, et. al.Aqil M Azmi ... Reham S Almajed
10 Oct 2013
Natural Language Engineering | VOL. 21

Parts of Speech, Lexical Categories, and Word Classes in Morphology
Jaklin Kornfilt
-
Jaklin KornfiltJaklin Kornfilt
30 Jan 2020
30 Jan 2020

A Pointwise Approach for Vietnamese Diacritics Restoration
Tuan Anh Luu ... Kazuhide Yamamoto
-
Tuan Anh Luu, et. al.Tuan Anh Luu ... Kazuhide Yamamoto
01 Nov 2012
01 Nov 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Light Diacritic Restoration to Disambiguate Homographs in Modern Arabic Texts

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing