Machine Learning–based Modeling and Prediction of the Intrinsic Relationship between Human Emotion and Music

Jun Su,Peng Zhou

doi:10.1145/3534966

Abstract

Human emotion is one of the most complex psychophysiological phenomena and has been reported to be affected significantly by music listening. It is supposed that there is an intrinsic relationship between human emotion and music, which can be modeled and predicted quantitatively in a supervised manner. Here, a heuristic clustering analysis is carried out on large-scale free music archive to derive a genre-diverse music library, to which the emotional response of participants is measured using a standard protocol, consequently resulting in a systematic emotion-to-music profile. Eight machine learning methods are employed to statistically correlate the basic sound features of music audio tracks in the library with the measured emotional response of tested people to the music tracks in a training set and to blindly predict the emotional response from sound features in a test set. This study found that nonlinear methods are more robust and predictable but considerably more time-consuming than linear approaches. The neural networks have strong internal fittability but are associated with a significant overfitting issue. The support vector machine and Gaussian process exhibit both high internal stability and satisfactory external predictability in all used methods; they are considered as promising tools to model, predict, and explain the intrinsic relationship between human emotion and music. The psychological basis and perceptional implication underlying the built machine learning models are also discussed to find out the key music factors that affect human emotion.

Full Text