Abstract

This paper presents an efficient approach for maximizing the accuracy of automatic speech emotion recognition in English, using minimal inputs, minimal features, lesser algorithmic complexity and reduced processing time. Whereas the findings reported here are based on the exclusive use of vowel formants, most of the related previous works used tens or even hundreds of other features. In spite of using a greater level of signal processing, the recognition accuracy reported earlier was often lesser than that obtained by our approach. This method is based on vowel utterances and the first step comprises statistical pre-processing of the vowel formants. This is followed by the identification of the best formants using the KMeans, K-nearest neighbor and Naive Bayes classifiers. The Artificial neural network that was used for the final classification gave an accuracy of 95.6% on elicited emotional speech. Nearly 1500 speech files from ten female speakers in the neutral and six basic emotions were used to prove the efficiency of the proposed approach. Such a result has not been reported earlier for English and is of significance to researchers, sociologists and others interested in speech.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.