Transformer-based automatic Arabic text diacritization

Ali Assad,Abdul Hadi M Alaidi,Amjad Yousif Sahib,Haider Th Salim Alrikabi,Ahmed Magdy

doi:10.37868/sei.v6i2.id305

Abstract

In Arabic natural language processing (NLP), automatic text diacritization is a major obstacle, and progress has been slow when compared to other language processing tasks. Automatic diacritical marking of Arabic text is proposed in this work using the first transformer-based paradigm designed solely for this task. By taking advantage of the attention mechanism, our system is able to capture more of the innate patterns in Arabic, surpassing the performance of both rule-based alternatives and neural network techniques. The model trained with the Clean-50 dataset had a diacritic error rate (DER) of 2.03%, even though the model trained with the Clean-400 dataset had a DER of 1.37%. As compared to state-of-the-art results, the improvement for the Clean-50 dataset is minimal. However, for the larger Clean-400 dataset, it is a notable improvement, indicating that this approach can deliver more accurate solutions for applications requiring precise diacritical marks with larger datasets. Additionally, this method achieves a DER of 1.21% for the Clean-400 dataset, and it performs even better when given extended input text with overlapping windows.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Transformer-based automatic Arabic text diacritization

Abstract

Talk to us

Similar Papers

More From: Sustainable Engineering and Innovation

Lead the way for us

Journal: Sustainable Engineering and Innovation	Publication Date: Nov 29, 2024
License type: CC BY 4.0

Similar Papers

Performance evaluation of solar thermal collectors in Colombian thermal floors by dynamic simulation
Brayan Eduardo Tarazona Romero ... Nilson Yulian Castillo-Leon
Sustainable Engineering and Innovation | VOL. -
Brayan Eduardo Tarazona Romero, et. al.Brayan Eduardo Tarazona Romero ... Nilson Yulian Castillo-Leon
03 Dec 2024
Sustainable Engineering and Innovation | VOL. -

Evaluation of model precision and performance discrepancies in simulated vs. experimental testing of vertical axis wind turbines
Miguel Duran Sarmiento ... Brayan Eduardo Tarazona Romero
Sustainable Engineering and Innovation | VOL. 6
Miguel Duran Sarmiento, et. al.Miguel Duran Sarmiento ... Brayan Eduardo Tarazona Romero
03 Dec 2024
Sustainable Engineering and Innovation | VOL. 6

The impact of renewable energy sources on economic recovery in Ukraine
Iaroslav Petrunenko ... Vadym Yemets
Sustainable Engineering and Innovation | VOL. 6
Iaroslav Petrunenko, et. al.Iaroslav Petrunenko ... Vadym Yemets
29 Nov 2024
Sustainable Engineering and Innovation | VOL. 6

Transformer-based automatic Arabic text diacritization
Ali Assad ... Ahmed Magdy
Sustainable Engineering and Innovation | VOL. 6
Ali Assad, et. al.Ali Assad ... Ahmed Magdy
29 Nov 2024
Sustainable Engineering and Innovation | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transformer-based automatic Arabic text diacritization

Abstract

Talk to us

Similar Papers

More From: Sustainable Engineering and Innovation