SEMAINE Database Research Articles

The fact that emotions are dynamic in nature and evolve across time has been explored relatively less often in automatic emotion recognition systems to date. Although within-utterance information about emotion changes recently has received some attention, there remain open questions unresolved, such as how to approach delta emotion ground truth, how to predict the extent of emotion change from speech, and how well change can be predicted relative to absolute emotion ratings. In this article, we investigate speech-based automatic systems for continuous prediction of the extent of emotion changes in arousal/valence. We propose the use of regression (smoothed) deltas as ground truth for emotion change, which yielded considerably higher inter-rater reliability than first-order deltas, a commonly used approach in previous research, and represent a more appropriate approach to derive annotations for emotion change research, findings which are applicable beyond speech-based systems. In addition, the first system design for continuous emotion change prediction from speech is explored. Experimental results under the Output-Associative Relevance Vector Machine framework interestingly show that changes in emotion ratings may be better predicted than absolute emotion ratings on the RECOLA database, achieving 0.74 vs 0.71 for arousal and 0.41 vs 0.37 for valence in concordance correlation coefficients. However, further work is needed to achieve effective emotion change prediction performances on the SEMAINE database, due to the large number of non-change frames in the absolute emotion ratings.

Automatic affect recognition is important for the ability of future technical systems to interact with us socially in an intelligent way by understanding our current affective state. In recent years there has been a shift in the field of affect recognition from “in the lab” experiments with acted data to “in the wild” experiments with spontaneous and naturalistic data. Two major issues thereby are the proper segmentation of the input and adequate description and modeling of affective states. The first issue is crucial for responsive, real-time systems such as virtual agents and robots, where the latency of the analysis must be as small as possible. To address this issue we introduce a novel method of incremental segmentation to be used in combination with supra-segmental modeling. For modeling of continuous affective states we use Long Short-Term Memory Recurrent Neural Networks, with which we can show an improvement in performance over standard recurrent neural networks and feed-forward neural networks as well as Support Vector Regression. For experiments we use the SEMAINE database, which contains recordings of spontaneous and natural human to Wizard-of-Oz conversations. The recordings are annotated continuously in time and magnitude with FeelTrace for five affective dimensions, namely activation, expectation, intensity, power/dominance, and valence. To exploit dependencies between the five affective dimensions we investigate multitask learning of all five dimensions augmented with inter-rater standard deviation. We can show improvements for multitask over single-task modeling. Correlation coefficients of up to 0.81 are obtained for the activation dimension and up to 0.58 for the valence dimension. The performance for the remaining dimensions were found to be in between that for activation and valence.

SEMAINE Database Research Articles

Related Topics

Articles published on SEMAINE Database

Estimation of affective dimensions using CNN-based features of audiovisual data

Capturing Feature and Label Relations Simultaneously for Multiple Facial Action Unit Recognition

Prediction of Emotion Change From Speech

AFEW-VA database for valence and arousal estimation in-the-wild

Facial Expression Recognition in the Presence of Speech Using Blind Lexical Compensation

A Dynamic Appearance Descriptor Approach to Facial Actions Temporal Modeling

Shape-based modeling of the fundamental frequency contour for emotion detection in speech

Exploring Cross-Modality Affective Reactions for Audiovisual Emotion Recognition

Keyword spotting exploiting Long Short-Term Memory

LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework

A multitask approach to continuous five-dimensional affect sensing in natural speech

The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

SEMAINE Database Research Articles

Related Topics

Articles published on SEMAINE Database

Estimation of affective dimensions using CNN-based features of audiovisual data

Capturing Feature and Label Relations Simultaneously for Multiple Facial Action Unit Recognition

Prediction of Emotion Change From Speech

AFEW-VA database for valence and arousal estimation in-the-wild

Facial Expression Recognition in the Presence of Speech Using Blind Lexical Compensation

A Dynamic Appearance Descriptor Approach to Facial Actions Temporal Modeling

Shape-based modeling of the fundamental frequency contour for emotion detection in speech

Exploring Cross-Modality Affective Reactions for Audiovisual Emotion Recognition

Keyword spotting exploiting Long Short-Term Memory

LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework

A multitask approach to continuous five-dimensional affect sensing in natural speech

The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent