Extraction of adaptive wavelet packet filter‐bank‐based acoustic feature for speech emotion recognition

Yongming Huang,Yue Li,Guobao Zhang,Ao Wu

doi:10.1049/iet-spr.2013.0446

Abstract

In this paper, a wavelet packet (WP)-based acoustic feature extraction approach is proposed for automatic speech emotion recognition (SER). First, the issue of optimising the WP filter-bank structure for giving classification task is presented as a tree pruning problem, and different tree-pruning criteria are investigated. On this basis, a novel WP-based feature is introduced for SER, namely discriminative band WP power coefficients. Finally, a SER system is built and extensive experiments are carried out. Experimental results show that the proposed feature considerably improves emotion recognition performance over conventional mel frequency cepstrum coefficient (MFCC) feature. The proposed feature extraction approach is promising since it can be easily extended to two-dimensional (2D) facial expression analysis with 2D WP quadtree structures, and further a high-quality audio–visual bimodal emotion recognition system is desirable.

Full Text