Spoken Arabic Digits Recognition Using Deep Learning

Abdulaziz Saleh Mahfoudh BA WAZIR,Joon Huang CHUAH

doi:10.1109/i2cacis.2019.8825004

Abstract

Speech recognition has undergone tremendous advancement over the past 50 years. Deep Neural Network (DNN) is one of the most popular methods for speech analysis thanks to its ability to minimize error rate for optimization problems. This research proposes an Arabic digits speech recognition model utilizing Recurrent Neural Network (RNN). The speech recognition model select the finest speech signal representation by feature extraction of Mel-Frequency Cepstrum Coefficients (MFCCs) after having been processed for noise reduction and digits separation. Extracted features from speech of digit are fed into a network with Long Short-Term Memory (LSTM) cells. The LSTM cells have the capability to solve problems associated with temporal dependencies requiring long-term learning and solve the vanishing gradient problems associated with RNN. A dataset of 1040 samples of spoken Arabic digits from different dialects are used in this study where 840 samples used to train the network and another 200 samples are used for testing purpose. The model training is carried out using a computing system with Graphics Processing Unit (GPU). The LSTM model learning parameters is tuned for optimization purpose achieving a higher accuracy of 94% during model training. The testing results of the tuned parameters model shows that the LSTM model can achieve 69% in accuracy when recognizing spoken Arabic digits. The model has the highest accuracy, i.e. 80%, when recognizing the digit zero.

Full Text