Discriminating High Arousal and Low Arousal Emotional Speech Using Mahalanobis Distance Among Acoustic Features

John Philip Bhimavarapu,Vinay Kumar Mittal,S S S Kalyan

doi:10.1109/ncc48643.2020.9056004

Abstract

Emotion classification from emotional speech continues to be a challenging research domain. Few research studies have attempted to discriminate amongst a set of emotions, and categorize for valence, activation and dominance. Discriminating between high-arousal and low-arousal emotions is itself challenging, but discriminating emotions within each subcategory is further challenging problem. In this study, a new approach is proposed to discriminate between high and low arousal emotions, and also amongst emotions within each subcategory. Mahalanobis distances amongst acoustic feature vectors of emotional speech w.r.t. normal speech are examined. The approach, involving speech production features, has been validated on three databases: German (Berlin EMO-DB), English (RAVDESS) and Telugu (IITKGP-SESC). A common set of five emotions Angry, Happy, Fear, Disgust and Sad are examined with reference to normal speech. The vocal-tract filter features Mel-frequency cepstral coefficients (MFCCs), and combined source-filter features signal energy, zero-crossing rate and duration are used. A 2D projection of Mahalanobis distance for one emotion, w.r.t. normal, onto another emotion is observed to discriminate amongst emotions within each high/low-arousal sub-category. The Angry and Happy emotions are discriminated in high-arousal emotions sub-category, whereas Fear, Disgust and Sad are discriminated in low-arousal emotions sub-category. This study should be helpful in further classifying emotions within each subcategory of high/low arousal emotions in emotional speech.

Full Text