Deep Learning Framework with Confused Sub-Set Resolution Architecture for Automatic Arabic Diacritization

Mohsen A A Rashwan,Ahmed Rafea,Ahmad A Al Sallab,Hazem M Raafat

doi:10.1109/taslp.2015.2395255

Abstract

The Arabic language belongs to a group of languages that require diacritization over their characters. Modern Standard Arabic (MSA) transcripts omit the diacritics, which are essential for many machine learning tasks like Text-To-Speech (TTS) systems. In this work Arabic diacritics restoration is tackled under a deep learning framework that includes the Confused Sub-set Resolution (CSR) method to improve the classification accuracy, in addition to an Arabic Part-of-Speech (PoS) tagging framework using deep neural nets. Special focus is given to syntactic diacritization, which still suffers low accuracy as indicated in prior works. Evaluation is done versus state-of-the-art systems reported in literature, with quite challenging datasets collected from different domains. Standard datasets like the LDC Arabic Tree Bank are used in addition to custom ones we have made available online to allow other researchers to replicate these results. Results show significant improvement of the proposed techniques over other approaches, reducing the syntactic classification error to 9.9% and morphological classification error to 3% compared to 12.7% and 3.8% of the best reported results in literature, improving the error by 22% over the best reported systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep Learning Framework with Confused Sub-Set Resolution Architecture for Automatic Arabic Diacritization

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Mar 1, 2015
Citations: 41

Similar Papers

Automatic Arabic diacritics restoration based on deep nets
Ahmad Al Sallab ... Ahmed Rafea
-
Ahmad Al Sallab, et. al.Ahmad Al Sallab ... Ahmed Rafea
01 Jan 2014
01 Jan 2014

ANALISIS KESALAHAN BERBAHASA DALAM RUBRIK “FOKUS” MAJALAH PENDAPA TAMANSISWA
Yosephus Dominikus Fernandez ... Mukhlish Mukhlish
Caraka: Jurnal Ilmu Kebahasaan, Kesastraan, dan Pembelajarannya | VOL. 4
Yosephus Dominikus Fernandez, et. al.Yosephus Dominikus Fernandez ... Mukhlish Mukhlish
15 Jun 2018
Caraka: Jurnal Ilmu Kebahasaan, Kesastraan, dan Pembelajarannya | VOL. 4

Arabic Diacritization Using Bidirectional Long Short-Term Memory Neural Networks With Conditional Random Fields
Abdulmohsen Al-Thubaity ... Waleed Alsanie
IEEE Access | VOL. 8
Abdulmohsen Al-Thubaity, et. al.Abdulmohsen Al-Thubaity ... Waleed Alsanie
01 Jan 2020
IEEE Access | VOL. 8

Exploring the Performance of Farasa and CAMeL Taggers for Arabic Dialect Tweets
Areej Alshutayri ... Hajer Alwadei
The International Arab Journal of Information Technology | VOL. 20
Areej Alshutayri, et. al.Areej Alshutayri ... Hajer Alwadei
01 Jan 2023
The International Arab Journal of Information Technology | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Learning Framework with Confused Sub-Set Resolution Architecture for Automatic Arabic Diacritization

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing