Abstract

This work investigates classification of emotions from full-body movements by using a novel Convolutional Neural Network-based architecture. The model is composed of two shallow networks processing in parallel where the 8-bit RGB images obtained from time intervals of 3D-positional data are the inputs. One network performs a coarse-grained modelling in the time domain while the other one applies a fine-grained modelling. We show that combining different temporal scales into one architecture improves the classification results of a dataset composed of short excerpts of the performances of professional dancers who interpreted four affective states: anger, happiness, sadness, and insecurity. Additionally, we investigate the effect of data chunk duration, overlapping, the size of the input images and the contribution of several data augmentation strategies for our proposed method. Better recognition results were obtained when the duration of a data chunk was longer, and this was further improved by applying balanced data augmentation. Moreover, we test our method on other existing motion capture datasets and compare the results with prior art. In all of the experiments, our results surpassed the state-of-the-art approaches, showing that this method generalizes across diverse settings and contexts.

Highlights

  • S Everal studies have acknowledged the importance of expression dynamics for perception and automatic recognition of emotions [1], [2], [3], [4]

  • We report the performance of the proposed method, which analyses the data at multiple temporal scales with a two-branch CNN architecture (Section 7.3, Table 5) and compare its performance with the prior art (Section 7.4, Table 6)

  • When we evaluated our proposed method on our dataset to classify four emotion classes, it achieved an average F1-score of 95%, presenting 3% improvement as compared to processing data at a single temporal scale

Read more

Summary

Introduction

S Everal studies have acknowledged the importance of expression dynamics for perception and automatic recognition of emotions [1], [2], [3], [4]. The expressive qualities of full-body movements, i.e., how a movement is performed, provide significant information about the emotional state of a person. Extracting expressive qualities of a movement conveying an emotion requires temporal analysis. Camurri et al [6] presented a conceptual framework for the analysis of expressive qualities of the movements. Inspired by previous research on human movement perception and dance theories (e.g., Laban Effort [7]), the authors postulate that computational models of expressive qualities should operate on different temporal scales. The first layer of their framework [6] consists of Manuscript received xx xx, 2020; revised xx xx, xx

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call