A multi-stage dynamical fusion network for multimodal emotion recognition.

Sihan Chen,Jiajia Tang,Wanzeng Kong,Li Zhu

doi:10.1007/s11571-022-09851-w

Abstract

In recent years, emotion recognition using physiological signals has become a popular research topic. Physiological signal can reflect the real emotional state for individual which is widely applied to emotion recognition. Multimodal signals provide more discriminative information compared with single modal which arose the interest of related researchers. However, current studies on multimodal emotion recognition normally adopt one-stage fusion method which results in the overlook of cross-modal interaction. To solve this problem, we proposed a multi-stage multimodal dynamical fusion network (MSMDFN). Through the MSMDFN, the joint representation based on cross-modal correlation is obtained. Initially, the latent and essential interactions among various features extracted independently from multiple modalities are explored based on specific manner. Subsequently, the multi-stage fusion network is designed to split the fusion procedure into multi-stages using the correlation observed before. This allows us to exploit much more fine-grained unimodal, bimodal and trimodal intercorrelations. For evaluation, the MSMDFN was verified on multimodal benchmark DEAP. The experiments indicate that our method outperforms the related one-stage multi-modal emotion recognition works.

Full Text