Use of Bidirectional Long Short Term Memory in Spoken Word Detection with reference to the Assamese language

Deepjyoti Kalita,Khurshid Alam Borbora,Dipen Nath

doi:10.17485/ijst/v15i27.655

Deepjyoti Kalita, Khurshid Alam Borbora + Show 1 more

Open Access

https://doi.org/10.17485/ijst/v15i27.655

Copy DOI

Journal: Indian Journal of Science and Technology	Publication Date: Jul 20, 2022
Citations: 1	License type: cc-by

Abstract

Objectives : The proposed method is based on a unique technique of Deep learning for identifying spoken words with reference to Assamese language. Most of the DNN based algorithms have been successfully implemented in the field of image recognition, computer vision, natural language processing and medical picture analysis. Methods: The method used here is the Bidirectional Long Short Term Memory (BLSTM). BLSTM incorporates both past and future situations together. The speech database for this research work is hired from the repository of Indian Language Technology Proliferation and Development Center (ILTP-DC). This repository contains 32,335 utterances by 1000 numbers of male and female participants, which is comprised of 262 unique Assamese native words. The BLSTM based recognition model is using 10 out of the 262 unique words and the remaining words are used in construction or generation of synthesized sentences. The feature extraction module uses 39 feature coefficients, which are composed of MFCC, ΔMFCC and ΔΔMFCC coefficients. Findings: The Word Error Rate (WER) of the BLSTM based recognition model is 18.84% with an average accuracy of 98.12%, which sets one promising benchmark when compared to recent findings. Novelty: In this work an attempt has been made with a different approach to detect certain keywords of Assamese language by adopting deep learning methodology. The future objective of this proposed work is to improve the detection capability of this model by considering multiple DNN models together in a hybrid approach along with the inclusion of additional features. Keywords: Bidirectional Long Short Term Memory; Deep Learning; Speech recognition; WER; MFCC

Full Text