Abstract

Studies have shown that emotional variability in speech degrades the performance of speaker recognition tasks. Of particular interest is the error produced due to mismatch between training speaker recognition models with neutral speech and testing them with expressive speech. While previous studies have considered categorical emotions, expressive speech during human interaction conveys subtle behaviors that are better characterized with continuous descriptors (e.g., attributes such as arousal, valence, dominance). As the emotion becomes more intense, we expect the performance of speaker recognition tasks to drop. Can we define emotional regions for which the speaker recognition performance is expected to be reliable? This study focuses on automatically predicting reliable regions for speaker recognition by analyzing and predicting the emotional content. We collected a unique emotional database from 80 speakers. We estimate speaker recognition performance as a function of arousal and valence, creating regions in this space where we can reliably recognize the identity of a speaker. Then, we train speech emotion recognizers designed to predict whether the emotional content in a sentence is within the reliable region. The experimental evaluation demonstrates that sentences that are classified as reliable for speaker recognition tasks have lower equal error rate (EER) than sentences that are considered unreliable.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.