An acoustic model and linguistic analysis for Malayalam disyllabic words: a low resource language

K R Lekshmi,Elizabeth Sherly

doi:10.1007/s10772-021-09807-1

Abstract

Automatic Speech Recognition (ASR) has reaped a lot of attention in recent years. Despite the recent advancements in ASR, the potential for extracting the raw features from speech remains lacking. This paper proposes an Automatic Speech Recognition system on Malayalam speech data using spectrogram images and Convolutional Neural Network (CNN). The voicegram/spectrogram images of sound files are generated, which is fed into CNN. Convolutional Neural Network topology is defined with a set of Convolution and Fully Connected layers and used Softmax layer for classification. An accuracy of 93.33% achieved with this proposed model indicates that spectrogram image-based approaches have promising results in speech-based recognition. An analysis of acoustic characteristics of Malayalam disyllabic words selected to design the ASR system with formant analysis, voice onset time and spectral moments from 4000 tokens produced by 20 speakers is also conducted. A comparison between CNN model and multiple classifiers with acoustic features have been performed and proved the efficiency of deep Neural Networks over raw features.

Full Text