Abstract

Due to the subjective nature of music mood, it is challenging to computationally model the affective content of music. In this work for estimating music mood, we present a deep Gaussian process regression model that is inspired from deep learning architectures. A variational Bayesian approach is used to learn the Gaussian mixture model representation for acoustic features. Since the exact inference on deep Gaussian processes (DGP) is intractable, the pseudo-data approximation is used to reduce the training complexity and Monte Carlo sampling technique is used to solve the intractability problem during training. A detailed derivation of a 3-layer DGP is presented that can be easily generalized to a L-layer DGP. The proposed work is evaluated on PMEmo dataset containing valence and arousal annotations of Western popular music and achieves an improvement in the coefficient of determination of 16.9% for arousal and 28.8% for valence estimation relative to the baseline single-layer Gaussian process.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call