Abstract

Enhancing the performance of emotional speaker recognition process has witnessed an increasing interest in the last years. This paper highlights a methodology for speaker recognition under different emotional states based on the mul-ticlass Support Vector Machine (SVM) classifier. We compare two feature extraction methods which are used to represent emotional speech utterances in order to obtain best accuracies. The first method known as traditional Mel-Frequency Cepstral Coefficients (MFCC) and the second one is MFCC combined with Shifted-Delta-Cepstra (MFCC-SDC). Experimentations are conducted on IEMOCAP database using two multiclass SVM ap-proaches: One-Against-One (OAO) and One Against-All (OAA). Obtained results show that MFCC-SDC features outperform the conventional MFCC.

Highlights

  • Emotional speaker recognition is one of research fields in Human-Computer Interaction (HCI) or affective computing [1]

  • We propose to investigate Mel Frequency Cepstral Coefficients (MFCC)-SDC features to improve the performances of the speaker recognition system in emotional talking environment

  • Two multiclass Support Vector Machines (SVM) approaches including One Against-All (OAA) and OAO were used in order to evaluate the proposed emotional speaker recognition

Read more

Summary

INTRODUCTION

Emotional speaker recognition is one of research fields in Human-Computer Interaction (HCI) or affective computing [1]. The main motivation comes from the want to develop a human machine interface that’s more intelligent, adaptive and credible This may gives computers the ability to know person in such context for many real applications .Speaker recognition in emotional context can be used in criminal or forensic investigation to identify the suspected person who produces the emotional utterances. It can be used in telecommunication to ameliorate the telephone based speech recognition performance,etc. We propose to investigate MFCC-SDC features to improve the performances of the speaker recognition system in emotional talking environment.

SYSTEM DESIGN
FEATURE EXTRACTION
Mel Frequency Cepstral Coefficients
Shifted Delta Cepstra features
CLASSIFICATION
Emotional Database
Experimental setup
Results
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call