Abstract

The task of emotional voice conversion (EVC) aims to convert speech from one emotional state into another, while keeping linguistic content, speaker identity and other emotion-independent information unchanged. Because previous studies were limited to a specific set of emotions, it is challenging to realize the conversion of emotions never seen in training stage. In this paper, we propose a one-shot emotional voice conversion model based on feature separation. The proposed method could control emotional characteristics with Global Emotion Embeddings (GEEs), and introduce activation guidance (AG) and mutual information (MI) minimization to reduce the correlations between emotion embedding and emotion-independent representation. At run-time conversion, it could produce the desired emotional utterance from a single pairwise utterance without any emotion labels, whether the target emotion appears in the training set or not. The subjective and objective evaluations validate the effectiveness of our proposed model for both the degree of feature separation and emotion expression, even it could achieve unseen emotion conversion.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.