Abstract

Recognition of emotion from speech is a significant subject in man-machine fields. In this study, speech signal has analyzed in order to create a recognition system which is able to recognize human emotion and a new set of characteristic has proposed in time, frequency and time–frequency domain in order to increase the accuracy. After extracting features of Pitch, MFCC, Wavelet, ZCR and Energy, neural networks classify four emotions of EMO-DB and SAVEE databases. Combination of features for two emotions in EMO- DB database is 100%, for three emotions is 98.48% and for four emotions is 90% due to the variety of speech, existing more spoken words and distinguishing male and female which is better than the result of SAVEE database. In SAVEE database, accuracy is 97.83% for two emotions of happy and sad, 84.75% for three emotions of angry, normal and sad and 77.78% for four emotions of happy, angry, sad and normal

Highlights

  • Speech is a communicative process among humans

  • 340 speeches are chosen from Berlin database and 300 speeches are chosen from SAVEE. 20 percent of data were used for testing and 80 percent were used for training

  • As you see in the table, the wavelet features in EMO-DB is 85.29% and in SAVEE is 53.57% by using the feature combination as wavelet, Mel Frequency Cepstral Coeifficient (MFCC), energy, Zero Crossing (ZCR), pitch, energy Fourier, ZCR Fourier, pitch Fourier accuracy is obtained 100% in EMO-DB and 97.83% in SAVEE database, and for the Berlin database, accuracy is 2.7% better than SAVEE

Read more

Summary

INTRODUCTION

Speech is a communicative process among humans. One of the most significant characteristics of speech is transferring of internal emotion to the audiences. Firozshah et al have used MFCC and ANN to recognize four emotionsas angry, happy,neutral and sad which have the accuracy of recognition 72.05, 66.05 and 71.25 for women, men and mixtures of themrespectively [1]. Javidi et al have used MFCC, ZCR, Pitch, Energy and combination of the CHAID decision Tree, Regression, SVM, C5.0 and ANN to recognize as angry, happy, neutral, sadness, disgust, fear and boredom emotions, and the accuracy of recognition using ANN was 71.70 [2]. Haq et al have used 7 emotions of angry, disgust, fear, happy, neutral, sad, surprised and energy feature extraction, duration, MFCC, pitch and MLB which has obtained 53%accuracy rate [5]. Ververidis et al have used angry, happy, neutral and sad emotions They extract the features of energy, formant and pitch and their accuracy was 53.7%[6].

EMOTION SPEECH RECOGNITION SYSTEM
Framing
Windowing
Energy
CLASSIFYING MODEL OF ARTIFICIAL NEURAL NETWORK
DATABASE
IMPLEMENTATION METHOD AND ANALYSIS OF RESULTS
CONCLUSION
Findings
VIII. SUGGESTION FOR FUTURE WORKS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.