Abstract

Recently recognition of emotions became very important in the field of speech and/or speaker recognition. This paper is dedicated to experimental investigation of best acoustic features obtained for purpose of gender-dependent speaker recognition from emotional speech. Four feature sets LPC (Linear Prediction Coefficients), LPCC (Linear Prediction Cepstral Coefficients), MFCC (Melfrequency Cepstral Coefficients) and PLP (Perceptual linear prediction) coefficients were compared in an experimental setup of speaker recognition system, based on i-vector representation. For evaluation of the system emotional speech recordings from newly created Slovak emotional database and Mahalanobis distance metric as scoring method were used. The results of the experiment showed the MFCC representation as the best fitted for speaker verification from Slovak emotional speech with recognition rate higher than 80%.

Highlights

  • Emotions play very important part in everyday human communication

  • To the speaker recognition purposes emotional databases in foreign languages were used and the results showed the contribution of emotional speech in process of speaker recognition

  • The final Mahalanobis scoring is defined as score(w1,w2 ) =− w − ws where w1 and w2 are two i-vectors scored by the logprobability that w1 and w2 belong to the same class in accordance to the covariance matrix Ws

Read more

Summary

INTRODUCTION

Since human interaction is multimodal it covers speech but gestures, facial expression and “body language” as well To this kind of communication emotions fill very important background information according which we – people – are able to identify the meaning of the explicit spoken message [1]. To the disposition of this work several emotional databases in foreign languages were available, namely EMA (Electromagnetic Articulography) database [6], EMO DB (Berlin emotional database) [7], EESC (Estonian emotional speech corpus) [8] and BAUM-2 [9] None of those corpuses satisfied the requirements of emotional speaker data volume for purposes of proper training of the speaker model. The main idea of i-vectors is based on the use of only one GMM supervector subspace of low dimension In this space speaker as well as channel variability information is represented this space is called total variability space.

SYSTEM DESCRIPTION
Total variability matrix training
Testing
EMOTIONAL DATABASE
EXPERIMENTAL SETUP
Findings
OF RESULTS AND CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call