Audio compression with multi-algorithm fusion and its impact in speech emotion recognition

A. Pramod Reddy,V. Vijayarajan

doi:10.1007/s10772-020-09689-9

Abstract

The study examines the impact of multi-algorithm fusion over audio compression with reference to the traditional exercises. For emotion recognition, here the most prominent features ‘Mel Frequency Cepstral Coefficients’ (MFCC) and ‘Discrete Wavelet Transform’ (DWT) features are extracted from prevalent speech samples of Berlin emotional database and Telugu (a south Indian language) database, we proposed automatic emotion recognition system (AERS) based on multi-algorithms fusion. AERS means to monitor and identify unit psychological/emotional state. The extracted features are analyzed using support vector machine, K-NN algorithms used for the classification of different states of emotion. Using two state-of-art mp3, Speex codec with different bit-rates investigated to ensure specific emotional intelligibility. MP3 codec configuration with 96 kbps bit-rate is recommended to pull off high compression for all emotions. Fusion algorithms also performed well compared with individual algorithms. Accuracy of 94.2% using fusion DWT and MFCC compared to 89.1% using DWT and 91.38% using MFCC separately. The accuracy of the proposed method increased further to 94% through a multiresolution approach by approximating frequency along with time information.

Full Text