Correcting Arabic Soft Spelling Mistakes using BiLSTM-based Machine Learning

Gheith Abandah,Mohammed Z Khedher,Ashraf Suyyagh

doi:10.14569/ijacsa.2022.0130594

Abstract

Soft spelling mistakes are a class of mistakes that is widespread among native Arabic speakers and foreign learners alike. Some of these mistakes are typographical in nature. They occur due to orthographic variations of some Arabic letters and the complex rules that dictate their correct usage. Many people forgo these rules, and given the identical phonetic sounds, they often confuse such letters. In this paper, we investigate how to use machine learning to correct such mistakes given that there are no suﬀicient datasets to train the correction models. Soft errors detection and correction is an active field in Arabic natural language processing. We generate training datasets using proposed transformed input approach and stochastic error injec-tion approach. These approaches are applied to two acclaimed datasets that represent Classical Arabic and Modern Standard Arabic. We treat the problem as character-level, one-to-one sequence transcription problem. This one-to-one transcription of mistakes that include omissions and deletions is possible with adopted simple transformations. This approach permits using bidirectional long short-term memory (BiLSTM) models that are more effective to train compared to other alternatives such as encoder-decoder models. Based on investigating multiple alternatives, we recommend a configuration that has two BiLSTM layers, and is trained using the stochastic error injection approach with error injection rate of 40%. The best model corrects 96.4%of the injected errors and achieves a low character error rate of 1.28% on a real test set of soft spelling mistakes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2022
Citations: 5	License type: cc-by

R Discovery Prime

R Discovery Prime

Correcting Arabic Soft Spelling Mistakes using BiLSTM-based Machine Learning

Abstract

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

Automatic Modulation Recognition Based on a DCN-BiLSTM Network.
Kai Liu ... Wanjun Gao
Sensors | VOL. 21
Kai Liu, et. al.Kai Liu ... Wanjun Gao
24 Feb 2021
Sensors | VOL. 21

Contextualized Satire Detection in Short Texts Using Deep Learning Techniques
Ashraf Kamal ... Muhammad Abulaish
Journal of Web Engineering | VOL. -
Ashraf Kamal, et. al.Ashraf Kamal ... Muhammad Abulaish
27 Mar 2024
Journal of Web Engineering | VOL. -

A fault detection of aero-engine rolling bearings based on CNN-BiLSTM network integrated cross-attention
Zhilei Jiang ... Chengpu Wu
Measurement Science and Technology | VOL. 35
Zhilei Jiang, et. al.Zhilei Jiang ... Chengpu Wu
13 Sep 2024
Measurement Science and Technology | VOL. 35

A Hybrid CNN-BiLSTM Voice Activity Detector
Nicholas Wilkinson ... Thomas Niesler
-
Nicholas Wilkinson, et. al.Nicholas Wilkinson ... Thomas Niesler
06 Jun 2021
06 Jun 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Correcting Arabic Soft Spelling Mistakes using BiLSTM-based Machine Learning

Abstract

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications