Bangla Speech Emotion Recognition and Cross-Lingual Study Using Deep CNN and BLSTM Networks

Sadia Sultana,Md Mijanur Rashid,M Shahidur Rahman,M Reza Selim,M Zafar Iqbal

doi:10.1109/access.2021.3136251

Abstract

In this study, we have presented a deep learning-based implementation for speech emotion recognition (SER). The system combines a deep convolutional neural network (DCNN) and a bidirectional long-short term memory (BLSTM) network with a time-distributed flatten (TDF) layer. The proposed model has been applied for the recently built audio-only Bangla emotional speech corpus SUBESCO. A series of experiments were carried out to analyze all the models discussed in this paper for baseline, cross-lingual, and multilingual training-testing setups. The experimental results reveal that the model with a TDF layer achieves better performance compared with other state-of-the-art CNN-based SER models which can work on both temporal and sequential representation of emotions. For the cross-lingual experiments, cross-corpus training, multi-corpus training, and transfer learning were employed for the Bangla and English languages using the SUBESCO and RAVDESS datasets. The proposed model has attained a state-of-the-art perceptual efficiency achieving weighted accuracies (WAs) of 86.9%, and 82.7% for the SUBESCO and RAVDESS datasets, respectively.

Highlights

Identifying human emotions from voice signals, using a machine learning approach, is important to construct a naturallike human-computer interaction (HCI) system
The weighted accuracies (WAs) accuracy achieved for this model is 86.86%, and the average f1 score is 86.86%
In comparisons with other models tested here, we found that the seven-layer CONVOLUTIONAL NEURAL NETWORK (CNN) architectures provided comparable performance at a fraction of the training time needed for other architectures

Summary

Introduction

Identifying human emotions from voice signals, using a machine learning approach, is important to construct a naturallike human-computer interaction (HCI) system. Selecting the appropriate features for classifying emotions accurately is the most crucial design decision. Have been employed to construct SER in various studies [3]. Those features can be further classified as temporal (time-domain) and spectral (frequency-domain) features. The ultimate result of SER is obtained by the use of a classifier, which allows the system to determine the best match for input emotional speech. Hidden Markov Model (HMM), Support Vector Machines (SVM), Gaussian Mixture Model (GMM), Artificial Neural Network (ANN), decision trees, and ensemble approaches are some wellknown classifiers that have been employed in previous stud-

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2022
Citations: 37	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Bangla Speech Emotion Recognition and Cross-Lingual Study Using Deep CNN and BLSTM Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Evaluation of Deep Learning Models for Multi-Step Ahead Time Series Prediction
Rohitash Chandra ... Rishabh Gupta
IEEE Access | VOL. 9
Rohitash Chandra, et. al.Rohitash Chandra ... Rishabh Gupta
01 Jan 2020
IEEE Access | VOL. 9

Speech Emotion Recognition by Combining a Unified First-Order Attention Network With Data Balance
Gang Chen ... Shiqing Zhang
IEEE Access | VOL. 8
Gang Chen, et. al.Gang Chen ... Shiqing Zhang
01 Jan 2020
IEEE Access | VOL. 8

An ensemble method to forecast 24-h ahead solar irradiance using wavelet decomposition and BiLSTM deep learning network.
Pardeep Singla ... Manoj Duhan
Earth Science Informatics | VOL. 15
Pardeep Singla, et. al.Pardeep Singla ... Manoj Duhan
17 Nov 2021
Earth Science Informatics | VOL. 15

Automatic gear shift strategy for manual transmission of mine truck based on Bi-LSTM network
Liyong Wang ... Min Xie
Expert Systems With Applications | VOL. 209
Liyong Wang, et. al.Liyong Wang ... Min Xie
03 Aug 2022
Expert Systems With Applications | VOL. 209

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bangla Speech Emotion Recognition and Cross-Lingual Study Using Deep CNN and BLSTM Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access