Abstract

This paper presents a multimodal emotion recognition method that uses a feature-level combination of three-dimensional (3D) geometric features (coordinates, distance and angle of joints), kinematic features such as velocity and displacement of joints, and features extracted from daily behavioral patterns such as frequency of head nod, hand wave, and body gestures that represent specific emotions. Head, face, hand, body, and speech data were captured from 15 participants using an infrared sensor (Microsoft Kinect). The 3D geometric and kinematic features were developed using raw feature data from the visual channel. Human emotional behavior-based features were developed using inter-annotator agreement and commonly observed expressions, movements and postures associated to specific emotions. The features from each modality and the behavioral pattern-based features (head shake, arm retraction, body forward movement depicting anger) were combined to train the multimodal classifier for the emotion recognition system. The classifier was trained using 10-fold cross validation and support vector machine (SVM) to predict six basic emotions. The results showed improvement in emotion recognition accuracy (The precision increased by 3.28% and the recall rate by 3.17%) when the 3D geometric, kinematic, and human behavioral pattern-based features were combined for multimodal emotion recognition using supervised classification.

Highlights

  • Emotion responsiveness in automated systems, computers, and assistive robotics greatly improves the quality of interaction with humans [1]

  • The results showed improvement in emotion recognition accuracy (The precision increased by 3.28% and the recall rate by 3.17%) when the 3D geometric, kinematic, and human behavioral pattern-based features were combined for multimodal emotion recognition using supervised classification

  • This study evaluates the emotion recognition accuracy on external datasets to measure the generalizability of the multimodal emotion recognition system. 3D data sets such as Microsoft Research Cambridge 12 (MSRC-12) [22], UCFKinect [23], and MSR Action 3D [24]

Read more

Summary

Introduction

Emotion responsiveness in automated systems, computers, and assistive robotics greatly improves the quality of interaction with humans [1]. For these interactions to be successful it is important that highly accurate emotion recognition systems exist. In addition to detecting emotions from the facial features, studies [4,11] have identified that the features from hand modality and the body modality, contribute significantly during the emotion recognition process. This research adopts the multimodal emotion recognition approach and uses three-dimensional data from the visual channel combined with the audio features. The implementation in this research uses features extracted from the head, face, hand, body, and speech input channels for the multimodal emotion recognition system

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call