BERT-Based Arabic Diacritization: A state-of-the-art approach for improving text accuracy and pronunciation

Ruba Kharsa,Ashraf Elnagar,Sane Yagi

doi:10.1016/j.eswa.2024.123416

Abstract

In order to accurately represent the meaning and pronunciation of Arabic words and sentences, the presence of diacritics plays a crucial role. Over the years, researchers have dedicated significant efforts to enhancing automated diacritization systems. This paper introduces a novel approach for Arabic diacritization utilizing Bidirectional Encoder representations from Transformers (BERT) models. To evaluate the effectiveness of the proposed approach, two publicly available datasets, namely the Arabic Diacritization (AD) dataset and the Tashkeela Processed (TP) dataset, were employed. The performance of the models was assessed using various error metrics, including Diacritic Error Rate (DER) and Word Error Rate (WER). The findings demonstrate the superior performance of BERT in the diacritization process, surpassing all models employed in other diacritization systems. On the AD dataset, the proposed system achieved state-of-the-art (SOTA) syntactic DER and WER of 1.14% and 3.34%, respectively. For morphological diacritization, the best results yielded a DER of 0.92% and a WER of 1.91%. These outcomes reflect a remarkable relative error reduction of over 30% compared to previous research. Additionally, on the TP dataset, the BERT models exhibited a substantial decrease in DER, reducing the benchmark from 4.0% to 1.11%. Furthermore, this study introduces a real-time diacritization system called SUKOUN, which offers diacritized text through a user-friendly website. A comparison with existing automatic diacritization tools, using six example texts, reveals the superior prediction accuracy and preservation of input format provided by SUKOUN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BERT-Based Arabic Diacritization: A state-of-the-art approach for improving text accuracy and pronunciation

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Journal: Expert Systems with Applications	Publication Date: Feb 13, 2024
Citations: 2

Similar Papers

An ERNIE-Based Joint Model for Chinese Named Entity Recognition
Yu Wang ... Yining Sun
Applied Sciences | VOL. 10
Yu Wang, et. al.Yu Wang ... Yining Sun
18 Aug 2020
Applied Sciences | VOL. 10

Public Sentiment Analysis of the Israel-Palestine Conflict on Social Media Using BERT
Syaiful Mulki Almubarok Renhoran ... Hamzah Setiawan
Indonesian Journal of Cultural and Community Development | VOL. 15
Syaiful Mulki Almubarok Renhoran, et. al.Syaiful Mulki Almubarok Renhoran ... Hamzah Setiawan
07 Oct 2024
Indonesian Journal of Cultural and Community Development | VOL. 15

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition
Xianrui Zheng ... Philip C Woodland
-
Xianrui Zheng, et. al.Xianrui Zheng ... Philip C Woodland
13 Dec 2021
13 Dec 2021

Chatbot Building with BERT for E-Commerce
Dr Guru Kesava Dasu Gopisetty
International Journal for Research in Applied Science and Engineering Technology | VOL. 12
Dr Guru Kesava Dasu GopisettyDr Guru Kesava Dasu Gopisetty
31 Mar 2024
International Journal for Research in Applied Science and Engineering Technology | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BERT-Based Arabic Diacritization: A state-of-the-art approach for improving text accuracy and pronunciation

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications