Abstract

With the help of feature fusion techniques, multi-modal emotion recognition has achieved great success, aiming to reach more naturally and human-likely communication during human-machine interaction. Existing methods focus on designing specific modules to generate better representations in the semantic space domain. However, we find that the frequency domain can enhance the emotion correlation among the same category, which is omitted by previous methods. To complement this feature, we design a novel feature fusion module based on the frequency domain to capture the information from both the space domain and frequency domain. Specifically, an attention-based mechanism is incorporated with Fourier transformation to inject the frequency information into the fused feature representation. Furthermore, analyzing features from the frequency domain may lose some normal semantic information such as appearance clues. Thus a residual connection is investigated during the feature representation and accomplishment of the final emotion recognition. Experimental results based on benchmark datasets demonstrate the effectiveness of the proposed module. In addition, we analyze the limitations and applicability of our method based on the existing datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call