Abstract

As an important branch of affective computing, Speech Emotion Recognition (SER) plays a vital role in human–computer interaction. In order to mine the relevance of signals in audios an increase the diversity of information, Bi-directional Long-Short Term Memory with Directional Self-Attention (BLSTM-DSA) is proposed in this paper. Long Short-Term Memory (LSTM) can learn long-term dependencies from learned local features. Moreover, Bi-directional Long-Short Term Memory (BLSTM) can make the structure more robust by direction mechanism because that the directional analysis can better recognize the hidden emotions in sentence. At the same time, autocorrelation of speech frames can be used to deal with the lack of information, so that Self-Attention mechanism is introduced into SER. The attention weight of each frame is calculated with the output of the forward and backward LSTM respectively rather than calculated after adding them together. Thus, the algorithm can automatically annotate the weights of speech frames to correctly select frames with emotional information in temporal network. When evaluate it on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database and Berlin database of emotional speech (EMO-DB), the BLSTM-DSA demonstrates satisfactory performance on the task of speech emotion recognition. Especially in emotion recognizing of happiness and anger, BLSTM-DSA achieves the highest recognition accuracies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.