A novel dual attention-based BLSTM with hybrid features in speech emotion recognition

Qiupu Chen,Guimin Huang

doi:10.1016/j.engappai.2021.104277

Abstract

Though the emotional state does not alter the content of language, it is a major determinant in human communication, because it provides much more positive feedback. The purpose of the speech emotion recognition is to automatically identify emotional or physiological state of a human being from their voice. In this paper, we propose a novel dual-level architecture, called dual attention-based bidirectional long short-term memory networks (dual attention-BLSTM) to recognize speech emotion. We also confirm that the recognition performance is better with different features as input than with only identical features in the dual-layer structure. Experiments on the IEMOCAP databases show the advantage of our proposed approach. The average recognition accuracy of our method is 70.29% in unweighted accuracy (UA) and the corresponding performance improvements are 2.89 compared to the best baseline methods. The results show that the architecture of our designed can better learn to distinguish features of the emotional information.

Full Text