Emotion conversion is one of the most inspiring forefronts of research in the arena of emotional speech synthesis. The main focus of the work is to convert a neutral speech sentence to the target emotional speech sentence using signal processing techniques. The parameters used for emotion conversion are pitch contour and intensity along with the duration of the sentence. Kannada Emotional Speech (KES) Database is created and used for analysis. The database consists of 4 (sadness, happy, anger, and fear) emotions with neutral. The pitch contour of different emotional sentences are analyzed and Gaussian Regression Model (GRM) is proposed for predicting the target pitch contour. The evaluation of the proposed method is done using Objective test & Subjective test. For objective test, mean pitch, the standard deviation of pitch, mean intensity and duration of the sentences are used. Evaluation using a subjective test is performed by calculating Emotion Recognition Rate (ERR) with the help of confusion matrix and also by taking the Mean Opinion Score (MOS) rating of the conversion system on the scale of 1-5. The result of Subjective test indicates that the effectiveness and discernment of emotion are improved when GRM is used for pitch contour modification with intensity and duration. The most recognized emotion was sadness with MOS of 3.52 and ERR of 83% and the least recognized emotion was anger with MOS of 1.74 and ERR of 66%. The results of the subjective and objective test show that the converted sadness, happy and fear speech is seeming very close to usual sadness, anger and fear emotion.
Read full abstract