Analyzing the Perceptual Salience of Audio Features for Musical Emotion Recognition

Erik M Schmidt,Jeffrey Scott,Youngmoo E Kim,Matthew Prockup,Brian Dolhansky,Brandon G Morton

doi:10.1007/978-3-642-41248-6_15

Abstract

AbstractWhile the organization of music in terms of emotional affect is a natural process for humans, quantifying it empirically proves to be a very difficult task. Consequently, no acoustic feature (or combination thereof) has emerged as the optimal representation for musical emotion recognition. Due to the subjective nature of emotion, determining whether an acoustic feature domain is informative requires evaluation by human subjects. In this work, we seek to perceptually evaluate two of the most commonly used features in music information retrieval: mel-frequency cepstral coefficients and chroma. Furthermore, to identify emotion-informative feature domains, we explore which musical features are most relevant in determining emotion perceptually, and which acoustic feature domains are most variant or invariant to those changes. Finally, given our collected perceptual data, we conduct an extensive computational experiment for emotion prediction accuracy on a large number of acoustic feature domains, investigating pairwise prediction both in the context of a general corpus as well as in the context of a corpus that is constrained to contain only specific musical feature transformations.Keywordsemotionmusic emotion recognitionfeaturesacoustic featuresmachine learninginvariance

Full Text