Emotion Recognition from Skeletal Movements.

Tomasz Sapiński,Adam Pelikant,Gholamreza Anbarjafari,Dorota Kamińska

doi:10.3390/e21070646

Abstract

Automatic emotion recognition has become an important trend in many artificial intelligence (AI) based applications and has been widely explored in recent years. Most research in the area of automated emotion recognition is based on facial expressions or speech signals. Although the influence of the emotional state on body movements is undeniable, this source of expression is still underestimated in automatic analysis. In this paper, we propose a novel method to recognise seven basic emotional states—namely, happy, sad, surprise, fear, anger, disgust and neutral—utilising body movement. We analyse motion capture data under seven basic emotional states recorded by professional actor/actresses using Microsoft Kinect v2 sensor. We propose a new representation of affective movements, based on sequences of body joints. The proposed algorithm creates a sequential model of affective movement based on low level features inferred from the spacial location and the orientation of joints within the tracked skeleton. In the experimental results, different deep neural networks were employed and compared to recognise the emotional state of the acquired motion sequences. The experimental results conducted in this work show the feasibility of automatic emotion recognition from sequences of body gestures, which can serve as an additional source of information in multimodal emotion recognition.

Highlights

People express their feelings through different modalities
Convolutional Neural Network (CNN) networks containing from 2 to 3 convolution layers followed by 1 to 2 dense layers, from 50 to 400 neurons for convolution and 50 to 200 for dense neurons; recurrent neural network (RNN) networks containing from 2 to 4 layers, built from 50 to 400 neurons; RNN-LSTM networks containing from 2 to 4 layers, built from 50 to 400 neurons; For all Neural Networks (NN) types, separate models were built increasing the neuron count on each layer by 25 for each new model
For CNN, the best results were obtained for a network of 4 layers, 3 layers of convolution neurons 250, 250, 100 for each layer respectively and a dense layer of 100 neurons

Summary

Introduction

There is evidence that the affective state of individuals is strongly correlated with facial expressions [1], body language [2] voice [3] and different types of physiological changes [4]. Mehrabian formulated the principle 7-38-55, according to which the percentage distribution of the message is as follows: 7% verbal signals and words, 38% strength, height, and rhythm and 55% body movements and facial expressions [8]. This suggests that words serve in particular to convey the information and the body language to form conversation or even to substitute the verbal communication. It has to be emphasised that this relation is applicable only if a communicator is talking about their feelings or attitudes [9]

Objectives

Results

Conclusion