A review on speech emotion recognition: A survey, recent advances, challenges, and the influence of noise

Swapna Mol George,P Muhamed Ilyas

doi:10.1016/j.neucom.2023.127015

Abstract

Affective Computing systems can detect the emotional state and mindset of an individual. Speech Emotion Recognition (SER) is a unimodal affect computing system based on emotional speech data. It is an active area of research in pattern recognition, computer vision, and deep learning. There is a great deal of literature on SER, but only a few of these works consider how SER performs under noisy conditions. A few surveys exist to review SER, but they either need to cover all aspects of SER in noisy environments or discuss the details thoroughly. In recent years, researchers have had a growing interest in using SER in real-world conditions and have seen improvements in recognition rate. This review compiles the methods and approaches used in noisy SER in the literature up to the mid of 2023. It covers topics such as noisy SER methods, datasets used for SER under noisy conditions, noise used, and toolkits used for noisy SER recognition. Additionally, it focuses on classifiers, features used, and limitations of existing research in noisy SER systems. The review also seeks to answer "Does noise affects performance?" to which the answer is a resounding yes, as demonstrated by the results obtained from this survey.

Full Text