Abstract

Multimodal emotion classification (MEC) has been extensively studied in human–computer interaction, healthcare, and other domains. Previous MEC research has utilized identical multimodal annotations (IMAs) to train unimodal models, hindering the learning of effective unimodal representations due to differences between unimodal expressions and multimodal perceptions. Additionally, most MEC fusion techniques fail to consider the unimodal–multimodal inconsistencies. This study addresses two important issues in MEC: learning satisfactory unimodal representations of emotion and accounting for unimodal–multimodal inconsistencies during the fusion process. To tackle these challenges, the authors propose the Two-Stage Conformer-based MEC model (Uni2Mul) with two key innovations: (1) in stage one, unimodal models are trained using independent unimodal annotations (IUAs) to optimize unimodal emotion representations; (2) in stage two, a Conformer-based architecture is employed to fuse the unimodal representations learned in stage one and predict IMAs, accounting for unimodal–multimodal differences. The proposed model is evaluated on the CH-SIMS dataset. The experimental results demonstrate that Uni2Mul outperforms baseline models. This study makes two key contributions: (1) the use of IUAs improves unimodal learning; (2) the two-stage approach addresses unimodal–multimodal inconsistencies during Conformer-based fusion. Uni2Mul advances MEC by enhancing unimodal representation learning and Conformer-based fusion.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.