Employing Emotion Cues to Verify Speakers in Emotional Talking Environments

Ismail Shahin

doi:10.1515/jisys-2014-0118

Abstract

AbstractUsually, people talk neutrally in environments where there are no abnormal talking conditions such as stress and emotion. Other emotional conditions that might affect people’s talking tone include happiness, anger, and sadness. Such emotions are directly affected by the patient’s health status. In neutral talking environments, speakers can be easily verified; however, in emotional talking environments, speakers cannot be easily verified as in neutral talking ones. Consequently, speaker verification systems do not perform well in emotional talking environments as they do in neutral talking environments. In this work, a two-stage approach has been employed and evaluated to improve speaker verification performance in emotional talking environments. This approach employs speaker’s emotion cues (text-independent and emotion-dependent speaker verification problem) based on both hidden Markov models (HMMs) and suprasegmental HMMs as classifiers. The approach is composed of two cascaded stages that combine and integrate an emotion recognizer and a speaker recognizer into one recognizer. The architecture has been tested on two different and separate emotional speech databases: our collected database and the Emotional Prosody Speech and Transcripts database. The results of this work show that the proposed approach gives promising results with a significant improvement over previous studies and other approaches such as emotion-independent speaker verification approach and emotion-dependent speaker verification approach based completely on HMMs.

Full Text