Abstract

Human social interaction is a multimodal process that integrates several communication channels, such as speech, facial expressions, body gestures, and touch. For intelligent robots, the ability to recognize human emotions based on these channels helps in realizing natural human-robot interaction. The existing works on multimodal emotion recognition (MER) mainly focus on facial expression, speech, text, and electrophysiological signals. However, there are few works on the implementation of MER based on touch. In this work, we established a facial expression and touch gesture emotion (FETE) dataset comprising six basic discrete emotions, which was made publicly available for the first time. In addition, a multi-eigenspace based multimodal fusion network (MMFN) was proposed for tactile-visual bimodal emotion recognition. The proposed MMFN projects the tactile and visual modalities in multi-eigenspace to learn the specific and shared representations and employs multiple classifiers to decode these features. We demonstrated the effectiveness of our network through experiments on the FETE dataset. The tactile-visual dual-modal emotion recognition using MMFN achieved an accuracy of 80.18% and an improvement of approximately 15% over the single-modality results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call