Dynamic Time Warping for Music Retrieval Using Time Series Modeling of Musical Emotions

James J Deng,Clement H.C Leung

doi:10.1109/taffc.2015.2404352

Abstract

Musical signals have rich temporal information not only at the physical level but at the emotion level. The listeners may wish to find music excerpts that have similar sequence patterns of musical emotions with given excerpts. Most state-of-the-art systems for emotion-based music retrieval concentrate on static analysis of musical emotions, and ignore dynamic analysis and modeling of musical emotions over time. This paper presents a novel approach to perform music retrieval based on time-varying musical emotion dynamics. A three-dimensional musical emotion model—Resonance-Arousal-Valence (RAV)—is used, and emotions of a piece of music are represented by musical emotion dynamics in a time series. A multiple dynamic textures (MDT) model is proposed to model music and emotion dynamics over time, and expectation maximization (EM) algorithm along with Kalman filtering and smoothing is used to estimate model parameters. Two smoothing methods—Rauch-Tung-Striebel (RTS) and minimum-variance smoothing (MVS)—to robust model are investigated and compared to find an optimal solution to enhance prediction. To find similar sequence patterns of musical emotions, subsequence dynamic time warping (DTW) for emotion dynamics matching is presented. Experimental results demonstrate the benefits of MDT to predict time-varying musical emotions, and our proposed method for music retrieval based on emotion dynamics outperforms retrieval methods based on acoustic features.

Full Text