Abstract
We investigate the automatic recognition of emotions in the singing voice and study the worth and role of a variety of relevant acoustic parameters. The data set contains phrases and vocalises sung by eight renowned professional opera singers in ten different emotions and a neutral state. The states are mapped to ternary arousal and valence labels. We propose a small set of relevant acoustic features basing on our previous findings on the same data and compare it with a large-scale state-of-the-art feature set for paralinguistics recognition, the baseline feature set of the Interspeech 2013 Computational Paralinguistics ChallengE (ComParE). A feature importance analysis with respect to classification accuracy and correlation of features with the targets is provided in the paper. Results show that the classification performance with both feature sets is similar for arousal, while the ComParE set is superior for valence. Intra singer feature ranking criteria further improve the classification accuracy in a leave-one-singer-out cross validation significantly.
Highlights
Automatic emotion recognition from speech has been a large research topic for over a decade
This paper investigates the performance of state-of-theart speech emotion recognition methods on a data set of singing voice recordings and compares this to the performance of a newly designed acoustic feature set, which is based on findings in [18]
4 Acoustic features We propose a feature set based on previous, careful analysis of acoustic parameters with respect to emotional expression in the singing and speaking voice as was presented in [18]
Summary
Automatic emotion recognition from speech has been a large research topic for over a decade. Papers have covered psychological and theoretical aspects of emotion expression in speech G., [1]) and presented early ideas for building systems to recognise emotions expressed in human speech Emotion recognition from the singing voice has largely been overlooked, the expression of emotion in music and singing is a highly visible and important phenomenon [4]. We apply methods from speaking voice emotion recognition to singing voice emotion recognition, evaluate classification performances for the first time, and take an in-depth look at important acoustic features. The paper is structured as follows: the section (2) gives an overview of related work and an in-depth introduction to the topic of vocal emotion recognition.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: EURASIP Journal on Audio, Speech, and Music Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.