Abstract

Continuous emotion recognition is a challenging task and a key part of human-computer interaction, especially multimodal emotion recognition can effectively improve the accuracy and robustness of recognition. But there are limited emotion data sets, and it is difficult to extract emotion features. We present a multi-level segmented decision level fusion emotion recognition model to improve the performance of emotion recognition. In this paper, we predict multi-modal dimensional emotional state on AVEC2017 dataset. Our model uses Bidirectional Long Short-Term Memory (BLSTM) as multi-level segmented emotional feature learning model, and uses the SVR model as fusion model of the decision layer. The BLSTM can model different forms of emotional information in time, and can also consider the impact of previous and later emotional features on current results. The SVR model can compensate for the redundant information of emotion recognition. At the same time, we also consider annotation delay and temporal pooling in our multi-modal dimensional emotion recognition model. Our multi-modal emotion recognition model achieves significant recognition improvements and provide the robustness. Finally, we compare the baseline methods which used the same dataset, and find that the CCC performance of our method is the best on arousal, which is 0.685. Our research shows that the proposed multi-layer segmentation decision level fusion emotion recognition model is conducive to improving performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call