Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech

Kusha Sridhar,Carlos Busso

doi:10.1109/taffc.2022.3187336

Kusha Sridhar, Carlos Busso

Open Access

https://doi.org/10.1109/taffc.2022.3187336

Copy DOI

Abstract

The prediction of valence from speech is an important, but challenging problem. The expression of valence in speech has speaker-dependent cues, which contribute to performances that are often significantly lower than the prediction of other emotional attributes such as arousal and dominance. A practical approach to improve valence prediction from speech is to adapt the models to the target speakers in the test set. Adapting a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">speech emotion recognition</i> (SER) system to a particular speaker is a hard problem, especially with <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">deep neural networks</i> (DNNs), since it requires optimizing millions of parameters. This study proposes an unsupervised approach to address this problem by searching for speakers in the train set with similar acoustic patterns as the speaker in the test set. Speech samples from the selected speakers are used to create the adaptation set. This approach leverages transfer learning using pre-trained models, which are adapted with these speech samples. We propose three alternative adaptation strategies: unique speaker, oversampling and weighting approaches. These methods differ on the use of the adaptation set in the personalization of the valence models. The results demonstrate that a valence prediction model can be efficiently personalized with these unsupervised approaches, leading to relative improvements as high as 13.52%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Affective Computing	Publication Date: Oct 1, 2022
Citations: 8	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Affective Computing

Lead the way for us

Similar Papers

A Primer on Machine Learning.
Audrene S Edwards ... Tun Jie
Transplantation | VOL. 105
Audrene S Edwards, et. al.Audrene S Edwards ... Tun Jie
18 Aug 2020
Transplantation | VOL. 105

Recognition of Emotions of Speech and Mood of Music: A Review
Gaurav Agarwal ... Sachi Gupta
-
Gaurav Agarwal, et. al.Gaurav Agarwal ... Sachi Gupta
01 Jan 2018
01 Jan 2018

Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition.
Hua Zhang ... Guojun Dai
Frontiers in Physiology | VOL. 12
Hua Zhang, et. al.Hua Zhang ... Guojun Dai
02 Mar 2021
Frontiers in Physiology | VOL. 12

A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.
Panikos Heracleous ... Akio Yoneyama
PLOS ONE | VOL. 14
Panikos Heracleous, et. al.Panikos Heracleous ... Akio Yoneyama
15 Aug 2019
PLOS ONE | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Affective Computing