Abstract
Automatic speech recognition (ASR) is one of the most demanding tasks in natural language processing owing to its complexity. Recently, deep learning approaches have been deployed for this task and have been proven to outperform traditional machine learning approaches such as Artificial Neural Network (ANN). In particular, deep-learning methods such as long short-term memory (LSTM) have achieved improved ASR performance. However, this method is limited to processing continuous input streams. Traditional LSTM requires four (4) linear layers (multilayer perceptron (MLP) layer) per cell with a large memory bandwidth for each sequence time step. LSTM cannot accommodate the many computational units required for processing continuous input streams because the system does not have sufficient memory bandwidth to feed the computational units. In this study, an enhanced deep learning LSTM recurrent neural network (RNN) model was proposed to resolve this shortcoming. In the proposed model, the RNN is incorporated as a “forget gate” to the memory block to allow the resetting of cell states at the beginning of the sub-sequences. This enables the system to process continuous input streams efficiently without necessarily increasing the required bandwidths. In the proposed model, the standard architecture of the LSTM network is modified to effectively use the model parameters. Some CNN-based and sequential models were used on the same dataset, and the models were compared with the proposed model. LSTM-RNN outperformed the other deep learning models with an accuracy of 99.36% on the well-established public benchmark spoken English digit dataset.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have