Improving Mispronunciation Detection of Arabic Words for Non-Native Learners Using Deep Convolutional Neural Network Features

Shamila Akhtar,Yousaf Bin Zikria,Muhammad Ehatisham-Ul-Haq,Fawad Riasat Raja,Naveed Khan Baloch,Farruh Ishmanov,Fawad Hussain

doi:10.3390/electronics9060963

Abstract

Computer-Aided Language Learning (CALL) is growing nowadays because learning new languages is essential for communication with people of different linguistic backgrounds. Mispronunciation detection is an integral part of CALL, which is used for automatic pointing of errors for the non-native speaker. In this paper, we investigated the mispronunciation detection of Arabic words using deep Convolution Neural Network (CNN). For automated pronunciation error detection, we proposed CNN features-based model and extracted features from different layers of Alex Net (layers 6, 7, and 8) to train three machine learning classifiers; K-nearest neighbor (KNN), Support Vector Machine (SVM) and Random Forest (RF). We also used a transfer learning-based model in which feature extraction and classification are performed automatically. To evaluate the performance of the proposed method, a comprehensive evaluation is provided on these methods with a traditional machine learning-based method using Mel Frequency Cepstral Coefficients (MFCC) features. We used the same three classifiers KNN, SVM, and RF in the baseline method for mispronunciation detection. Experimental results show that with handcrafted features, transfer learning-based method and classification based on deep features extracted from Alex Net achieved an average accuracy of 73.67, 85 and 93.20 on Arabic words, respectively. Moreover, these results reveal that the proposed method with feature selection achieved the best average accuracy of 93.20% than all other methods.

Highlights

Speech is the semantic element of human communication through which human convey their message to each other
We found that the features extracted from the Alex Net perform better than transfer learning-based models and handcrafted features in detecting mispronunciation of Arabic words
As the proposed framework focuses on the mispronunciation detection of the Arabic words; in this work, we used three classifiers Random Forest (RF), Support Vector Machine (SVM), K-nearest neighbor (KNN) to detect mispronunciation that classifies the words correctly

Summary

Introduction

Speech is the semantic element of human communication through which human convey their message to each other. Much research has been done to apply speech processing techniques to different languages [1,2,3]. Phonemic errors exist due to phones that can differentiate one word from another and create a difference in the meaning of speech. In this case, speakers mostly interchange the complete phonemes with another similar phoneme that creates a difference in sound. Many researchers have investigated to detect mispronunciation for different languages (English, Mandarin, Japanese, and Dutch), but little work is done in Arabic. Researchers have used different techniques to detect mispronunciation These techniques include posterior probability-based methods classifier-based methods, and deep learning-based methods. Authors worked on some confusing phonemes of the Arabic language, but still little work is done on Arabic words

Methods

Results

Discussion

Conclusion