Analysis of Influential Features with Spectral Features for Modeling Dialectal Variation in Malayalam Speech Using Deep Neural Networks

Rizwana Kallooravi Thandil,K P Mohamed Basheer

doi:10.1007/978-981-19-7615-5_46

Abstract

Over the past few decades, research has focused heavily on automatic speech recognition (ASR). Although ASR for a few languages is close to reality; ASR for the low resource languages like Malayalam is still in its infancy. Here, in this work, the authors discuss the experiment conducted on accented data of Malayalam speech using two approaches. One approach uses the spectral features for modeling using deep convolutional neural network and the other uses the influential features of speech signals for modeling using LSTM-RNN approach. The proposed methodology comprises three distinct stages; dataset preparation, feature extraction, classification, and hence the construction of deep learning models that recognize the accent-based spoken sentences in the Malayalam language. Mel-frequency cepstral coefficient (MFCC) algorithm, short-term Fourier transform (STFT), and mel spectrogram methodologies are used for feature engineering and hence the features that represent the speech signals are used for constructing the accented ASR system for the Malayalam language using long short-term memory (LSTM) a recurrent neural network (RNN). The spectrogram dataset has been constructed for the speech dataset and used for constructing the ASR model with deep convolutional neural network (DCNN). The result of the experiment shows that LSTM-based RNN outperforms DCNN for the proposed dataset that has been constructed for the experiment in the natural recording environment.

Full Text