Arabic Emotional Voice Conversion Using English Pre-Trained StarGANv2-VC-Based Model

Ali H Meftah,Sid-Ahmed Selouani,Yousef A Alotaibi

doi:10.3390/app122312159

Ali H Meftah, Sid-Ahmed Selouani + Show 1 more

Open Access

https://doi.org/10.3390/app122312159

Copy DOI

Journal: Applied sciences	Publication Date: Nov 28, 2022
Citations: 1	License type: CC BY 4.0

Affiliation: King Saud University, Université de Moncton

Abstract

The goal of emotional voice conversion (EVC) is to convert the emotion of a speaker’s voice from one state to another while maintaining the original speaker’s identity and the linguistic substance of the message. Research on EVC in the Arabic language is well behind that conducted on languages with a wider distribution, such as English. The primary objective of this study is to determine whether Arabic emotions may be converted using a model trained for another language. In this work, we used an unsupervised many-to-many non-parallel generative adversarial network (GAN) voice conversion (VC) model called StarGANv2-VC to perform an Arabic EVC (A-EVC). The latter is realized by using pre-trained phoneme-level automatic speech recognition (ASR) and fundamental frequency (F0) models in the English language. The generated voice is evaluated by prosody and spectrum conversion in addition to automatic emotion recognition and speaker identification using a convolutional recurrent neural network (CRNN). The results of the evaluation indicated that male voices were scored higher than female voices and that the evaluation score for the conversion from neutral to other emotions was higher than the evaluation scores for the conversion of other emotions.

Full Text