Enhancing accuracy of long contextual dependencies for Punjabi speech recognition system using deep LSTM

Virender Kadyan,Mohit Dua,Poonam Dhiman

doi:10.1007/s10772-021-09814-2

Abstract

Long short term memory (LSTM) is a powerful model in building of an ASR system whereas standard recurrent networks are generally inefficient to obtain better performance. Although these issues are addressed in LSTM neural network architecture but their performance get degraded on long contextual information. Recent experiments show that LSTM and their improved approaches like Deep LSTM requires a lot of tuning in training and experiences. In this paper Deep LSTM models are built on long contextual sentences by selecting optimal value of batch size, layer, and activation functions. It also indulge comparative study of train and test perplexity through computation of word error rate. Furthermore, we use hybrid discriminative approaches with different variants of iterations which shows significant improvement with Deep LSTM networks. Experiments are mainly perform on single sentences or one to two concatenated sentences. Deep LSTM achieves performance improvement of 3–4% over conventional Language Models (LMs) and modelling classifier approaches with acceptable word error rate on top of state-of-the-art Punjabi speech recognition system.

Full Text