Speaker Recognition in Emotional Environment using Excitation Features

Tintu Thomas,N.V Sobhana,Shashidhar G Koolagudi,Spoorthy V

doi:10.1109/icaecc50550.2020.9339501

Abstract

Speaker Recognition is known as the task of recognizing the person speaking from his/her speech. Speaker recognition has many applications including transaction authentication, access control, voice dialing, web services, etc. Emotive speaker recognition is important because in real life, human beings extensively express emotions during conversations, and emotions alter the human voice. A text-independent speaker recognition system is proposed in the work. The system designed is for emotional environment. The proposed system in this work is trained using the speech samples recorded in neutral environment and the system evaluation is performed in an emotional environment. Here, excitation source features are used to represent speaker-specific details contained in speech signal. The excitation source signal is obtained after separating the segmental level features from the voice samples. The excitation source signal is almost considered as a noise so identifying a speaker in an emotive environment is a challenging task. Excitation features include Linear Prediction (LP) residual, Glottal Closure Instance (GCI), LP residual phase, residual cepstrum, Residual Mel-Frequency Cepstral Coefficient (R-MFCC), etc. A decrease in performance is observed when the system is trained with neutral speech samples and tested with emotional speech samples. Different emotions considered for emotional speaker identification are happy, sad, anger, fear, neutral, surprise, disgust, and sarcastic For the classification of speakers the algorithms used are Gaussian Mixture Model (GMM), Support Vector Machine (SVM), K-Nearest Neighbor(KNN), Random Forest and Naive Bayes.

Full Text