Abstract

Recent emotion recognition models, most of them being based on strongly supervised deep learning solutions, are rather successful in recognizing instantaneous emotion expressions. However, when applied to continuous interactions, these models show a weaker adaptation to a person-specific and long-term emotion appraisal. In this article, we present an unsupervised neural framework that improves emotion recognition by learning how to describe continuous affective behavior of individual persons. Our framework is composed of three self-organizing mechanisms: (1) a recurrent growing layer to cluster general emotion expressions, (2) a set of associative layers, acting as <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">affective memories</i> to model specific emotional behavior of individual persons, (3) and an online learning layer which provides contextual modeling of continuous emotion expressions. We propose different learning strategies to integrate all three mechanisms and to improve the performance on arousal and valence recognition of the OMG-Emotion dataset. We evaluate our model with a series of experiments ranging from ablation studies assessing the different contributions of each neural component to an objective comparison with state-of-the-art solutions. The results from the evaluations show a good performance on emotion recognition of continuous emotions on monologue videos. Furthermore, we discuss how the model self-regulates the interplay between generalized and personalized emotion perception and how this influences the model’s reliability when recognizing unseen emotion expressions.

Highlights

  • It is widely accepted that basic emotional concepts are perceived by different persons around the world in a consistent manner [21], applying a computational model able to recognize emotions in natural scenarios remains a difficult task [11]

  • To address the limitations of our previous work, we propose in this paper the General-GWR layer which creates prototype neurons from the multimodal emotion expressions represented by the Cross-Channel Convolutional Neural Network (CCCNN) based on Growing-When-Required Networks (GWR)

  • One of them is the inability of current emotion expression recognition systems, mostly based on end-toend deep neural models, to adapt quickly to novel information

Read more

Summary

INTRODUCTION

It is widely accepted that basic emotional concepts are perceived by different persons around the world in a consistent manner [21], applying a computational model able to recognize emotions in natural scenarios remains a difficult task [11]. Existing solutions address the problem of continual learning in deep learning models by introducing transfer learning techniques [30], [45], neural activation and data distribution regularization [19], [50], and the unsupervised learning of affective features [29], [61] Most of these models present an improvement of performance when evaluated on specific datasets, but maintain the same limitations when applied to real-world scenarios, as the adaptation process is expensive and slow, demanding many interactions to learn new data instances. We proposed a developmental approach for emotion expression recognition [7], [8] which addressed the problem of online adaptation of emotional categories to newly perceived expressions This model implemented self-organizing layers which create clusters of similar expressions based on audio/visual characteristics extracted from a convolutional neural network. They contribute with a learning flow that balances the interplay between generalized and personalized emotion perception which results in a dynamic system that 1) represents multisensory affect, 2) classifies general emotion expressions into known clusters, 3) learns individualized emotion representations and 4) provides continuous emotion recognition

Representing Affective Stimuli
General Emotion Expressions
Individualized Emotion Representations
Continuous Emotion Perception
EVALUATING AFFMEM
Datasets and Pre-processing
Metrics
Exp 2: Continuous Emotion Recognition
Results
Evaluation OMG OMG
DISCUSSIONS
Recognizing Emotions in Monologues
The Effect of Generalization
The Effect of Personalization
The Interplay between Personalization and Generalization
Arousal Vs Valence Recognition
CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.