Abstract

Though the emotional state does not alter the content of language, it is a major determinant in human communication, because it provides much more positive feedback. The purpose of the speech emotion recognition is to automatically identify emotional or physiological state of a human being from their voice. In this paper, we propose a novel dual-level architecture, called dual attention-based bidirectional long short-term memory networks (dual attention-BLSTM) to recognize speech emotion. We also confirm that the recognition performance is better with different features as input than with only identical features in the dual-layer structure. Experiments on the IEMOCAP databases show the advantage of our proposed approach. The average recognition accuracy of our method is 70.29% in unweighted accuracy (UA) and the corresponding performance improvements are 2.89 compared to the best baseline methods. The results show that the architecture of our designed can better learn to distinguish features of the emotional information.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.