Abstract
This paper compares the classification ability of a few efficient classifiers in recognizing human speech emotions in terms of accuracy, computation time, and feature dimension. Both the static and dynamic mel-frequency cepstral coefficients (MFCC) are derived in the wavelet domain and are combined to form a suitable identification system model. Three popular neural network (NN) models such as the Gaussian mixture model (GMM), radial basis function network (RBFN), and the probabilistic neural network (PNN) have been put to test the reliability of these derived feature sets. The PNN classifier has shown to outperform both the RBFN and the GMM with low feature dimension, whereas the GMM shows an improved result for large feature dimensions. The combination of wavelet-based MFCCs and their dynamics remains more discriminative in classifying speech emotions as compared to either the MFCCs or wavelets acted alone.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.