Abstract

The automatic speech recognition and speech-based emotion recognition is based on statistical learning methods which are usually highly tuned. Using the content and emotional information from spoken utterances this provides the opportunity to generate human-machine communication which achieves characteristics of cognitive systems. The systems or recognisers are based on learning methods which are well-known. But, an interpretation or evaluation of such classifiers is usually challenging. Classifiers identify categorical regions in n-dimensional feature spaces by modelling the observation probability by mixtures of multivariate Gaussian densities. For this, we present an approach which allows a more detailed interpretation of the classifier and provides an insight to the method. Our approach is based on the breadth of the resulting Gaussian model which can be generated from the mixture models given by the classifier. We introduce the method and present first results on the EmoDB corpus using a classifier with seven mixtures per emotion. In this exemplary case the classification performance is 64.48% unweighted average recall over all Leave-One-Speaker-Out tests. Investigating the probability models, we draw first conclusions on the characteristics of the Gaussian mixtures applying the breadth as the only parameter.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.