Qualitative Analysis of PLP in LSTM for Bangla Speech Recognition

Nahyan Al Mahmud,Shahfida Amjad Munni

doi:10.5121/ijma.2020.12501

Abstract

The performance of various acoustic feature extraction methods has been compared in this work using Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic features are a series of vectors that represents the speech signals. They can be classified in either words or sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) have also been used. These two methods closely resemble the human auditory system. These feature vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to investigate the nature of those acoustic features.

Highlights

The objective is to simulate the humans’ ability to talk, to carry out of simple tasks by computers through the means of machine-human interaction, to turning speech to text through Automatic Speech Recognition (ASR) systems
The primary target of this study is to examine the efficiency of various acoustic vectors for Bangla speech detection using Long Short-Term Memory (LSTM) neural network and assess their performances based on different statistical parameters
LSTM Neural Network Structure LSTM, and in general, recurrent neural networks (RNN) based ASR systems [13,14,15] trained with connectionist temporal classification (CTC) [16] have recently been shown to work extremely well when there is an abundance of training data, matching and exceeding the performance of hybrid DNN systems [15]

Summary

INTRODUCTION

Speech is the most effective way of communication among people. This is the most natural way of conveying information. Only a handful of works have been carried out for Bangla, which is among the most widely spoken languages in the world in terms of number of speakers. Some of these efforts can be found in [8]. Majority of these studies mainly focussed on simple word-level detection worked on a very minor database. These works did not account for the various dialects of different parts of the country. The primary target of this study is to examine the efficiency of various acoustic vectors for Bangla speech detection using LSTM neural network and assess their performances based on different statistical parameters

Bangla Speech Database

Acoustic Feature Vectors

LSTM Neural Network

Bhattacharyya Distance

Mahalanobis Distance

PERFORMANCE ANALYSIS

DISCUSSION

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Qualitative Analysis of PLP in LSTM for Bangla Speech Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The International journal of Multimedia & Its Applications

Lead the way for us

Journal: The International journal of Multimedia & Its Applications	Publication Date: Oct 30, 2020
License type: cc-by

Similar Papers

Qualitative Analysis of PLP in LSTM for Bangla Speech Recognition
...
Zenodo (CERN European Organization for Nuclear Research) | VOL. -
, et. al. ...
01 Jan 2020
Zenodo (CERN European Organization for Nuclear Research) | VOL. -

Performance Analysis of Different Acoustic Features Based on LSTM for Bangla Speech Recognition
Nahyan Al Mahmud
SSRN Electronic Journal | VOL. -
Nahyan Al MahmudNahyan Al Mahmud
01 Jan 2020
SSRN Electronic Journal | VOL. -

Performance Analysis of Different Acoustic Features based on LSTM for Bangla Speech Recognition
Nahyan Al Mahmud
The International journal of Multimedia & Its Applications | VOL. 12
Nahyan Al MahmudNahyan Al Mahmud
31 Aug 2020
The International journal of Multimedia & Its Applications | VOL. 12

Prediction of InSAR deformation time-series using a long short-term memory neural network
Yi Chen ... Liya Gao
International Journal of Remote Sensing | VOL. 42
Yi Chen, et. al.Yi Chen ... Liya Gao
07 Jul 2021
International Journal of Remote Sensing | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Qualitative Analysis of PLP in LSTM for Bangla Speech Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The International journal of Multimedia &amp; Its Applications

More From: The International journal of Multimedia & Its Applications