A classification benchmark for Arabic alphabet phonemes with diacritics in deep neural networks

Eiad Almekhlafi,Moeen Al-Makhlafi,Erlei Zhang,Jun Wang,Jinye Peng

doi:10.1016/j.csl.2021.101274

Abstract

Although the Arabic language is the fourth most popular language in the world, it has not received sufficient attention in artificial intelligence research, especially in automatic speech recognition (ASR). The key feature of the Arabic language is that its words are pronounced exactly as they are written. Above all, taking into account the diacritics,22Throughout this paper, the alphabet is considered with diacritics. there are no words with similar pronunciation and writing. This motivates us to think of building an Arabic ASR system by recognizing its alphabet phonetics. Therefore, the Arabic alphabet phonemes classification must be studied, this is what the paper aims to achieve. In this paper, we create a new dataset, called Arabic alphabet phonetics dataset (AAPD). AAPD was collected by taking sound recordings of 1420 persons. We build several Arabic alphabet phonemes classification systems using three feature extraction techniques and four deep neural networks. Based on AAPD, we designed numerous experiments to compare the performance of feature extraction and classification methods, which can be used as a benchmark. Experimental results showed that Mel-frequency Cepstral Coefficient (MFCC) is considered most effective to feature extraction due to its highest accuracy, particularly when using 20 for Mel-bands number the training time is the least. Additionally, the appropriate model that achieved the highest accuracy with the least computational load is the proposed model VGG–based, where acquired an accuracy of 95.68%.

Full Text