Abstract
The multimodal emotion recognition in conversation (ERC) task presents significant challenges due to the complexity of relationships and the difficulty in achieving semantic fusion across various modalities. Graph learning, recognized for its capability to capture intricate data relations, has been suggested as a solution for ERC. However, existing graph-based ERC models often fail to address the fundamental limitations of graph learning, such as assuming pairwise interactions and neglecting high-frequency signals in semantically-poor modalities, which leads to an over-reliance on text. While these issues might be negligible in other applications, they are crucial for the success of ERC. In this paper, we propose a novel framework for ERC, namely multimodal graph learning with framelet-based stochastic configuration networks (i.e., Frame-SCN). Specifically, framelet-based stochastic configuration networks, which employ 2D directional Haar framelets to extract both low- and high-pass components, are introduced to learn the unified semantic embeddings from multimodal data, mitigating prediction biases caused by an excessive reliance on text without introducing an unnecessarily large number of parameters. Also, we develop a modality-aware information extraction module that is able to extract both general and sensitive information in a multimodal semantic space, alleviating potential noise issues. Extensive experiment results demonstrate that our proposed Frame-SCN outperforms many state-of-the-art approaches on two widely used multimodal ERC datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.