Emotion Recognition Based On Audio Speech

Journals Iosr ,Zahid Khaki Showkat Ahmad Dar

doi:10.6084/m9.figshare.1157323

Abstract

Emotion recognition aims at automatically identifying the emotional or physical state of a human being from his or her voice. The emotional and physical states of a speaker are known as emotional aspects of speech and are included in the so called paralinguistic aspects. Although the emotional state does not alter the linguistic content, it is an important factor in human communication, because it provides feedback information in many applications as making a machine to recognize emotions from speech is not a new idea. This paper presents automatic text independent speaker emotion recognition system using the pattern classification methods such as the support vector mechanics (SVM) .Acoustic features are derived from the speech signal at the segmental level. The segmental features are the features extracted from short frames (10-30 ms) of the speech. Acoustic features are derived from the speech signal at the segmental level. Acoustic features are represented by Mel frequency cepstral coefficients .A 39 dimensional MFCC for each frame is used as acoustic feature vector. The DFT based cepstral coeffiecients are computed by computing IDFT (inverse DFT) of the log magnitude short time spectrum of speech signal. Mel wraped cepstrum is obtained by inserting an intermediate step of transforming the frequencies before computing the IDFT. The Mel scale is based on human perception of frequency of sound. SVMs are used to construct the optimal separating hyper plane for speech features .SVMs are used to build the models for each speaker and to compare with the test speaker's feature vectors.

Full Text