Abstract

This paper presents a method to automatically detect emotional duality and mixed emotional experience using multimodal audio-visual continuous data. Co-ordinates, distance and movement of tracked points were used to create features from visual input that captured facial expressions, head, hand gestures and body movement. Spectral features, prosodic features were extracted from the audio channel. Audio-visual data along with depth information was recorded using the infrared sensor (Kinect). OpenEar toolkit and Face API was used for calculation of features. A combined feature vector was created by feature level fusion and a support vector machine (SVM) based classifier was used for emotion detection. 6 participants and 15 actions were used for recording simultaneous mixed emotional experience. The results showed that concurrent emotions can be automatically detected using multiple modalities. The overall accuracy using multimodal mixed emotion recognition was 96.6%. The accuracy from facial expressions (92.4%) and head movement (94.3%) was better compared to accuracies obtained from hand gesture (77.5%) and body movement (65.2%) alone.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.