Abstract
Multimodal emotion recognition, that is, emotion recognition uses machine learning to generate multi-modal features on the basis of videos which has become a research hotspot in the field of artificial intelligence. Traditional multi-modal emotion recognition method only simply connects multiple modalities, and the interactive utilization rate of modal information is low, and it cannot reflect the real emotion under the conflict of modal features well. This article first proves that effective weighting can improve the discrimination between modalities. Therefore, this paper takes into account the importance differences between multiple modalities, and assigns weights to them through the importance attention network. At the same time, considering that there is a certain complementary relationship between the modalities, this paper constructs an attention network with complementary modalities. Finally, the reconstructed features are fused to obtain a multi-modal feature with good interaction. The method proposed in this paper is compared with traditional methods in public datasets. The test results show that our method is accurate in It performs well in both the rate and confusion matrix metrics.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.