Directed Acyclic Graphs for Content Based Sound, Musical Genre, and Speech Emotion Classification

Stavros Ntalampiras

doi:10.1080/09298215.2013.859709

Abstract

This work introduces the methodology of Decision Directed Acyclic Graphs (DDAG)11 This work uses the following abbreviations as regards to directed graphs: Directed Acyclic Graph (DAG), Decision Directed Acyclic Graph (DDAG) and Directed Acyclic Graph Hidden Markov Model (DAGHMM). to the scientific domain of content based audio signal processing. We apply the particular methodology to three multiclass classification problems involving the categories of generalized sound events, musical genres, and speech expressing emotional states. A decision graph is constructed which breaks the overall problem into a series of two-class ones. The order of the graph nodes is revealed using a clustering criterion based on the Kullback-Leibler divergence. Every graph node is composed by two hidden Markov models, each one representing the class which participates in the specific problem. We extract three heterogeneous feature sets (Mel-Filterbank, MPEG-7 Audio Spectrum Projection and Perceptual Wavelet Packets) out of each recording and fuse them for training the HMMs. Extensive comparative experiments are conducted using the following three datasets: (a) a combination of professional sound effects collections, (b) GTZAN musical genre database, and (c) BERLIN emotional speech corpus. The results demonstrate the superiority of the DDAG classification approach over the standard HMM approach regardless the application task.

Full Text