A Gesture-to-Emotional Speech Conversion by Combining Gesture Recognition and Facial Expression Recognition

Nan Song,Peiwen Wu,Hongwu Yang

doi:10.1109/aciiasia.2018.8470350

Abstract

This paper proposes a facial expression integrated sign language to emotional speech conversion method to solve the communication problems between healthy people and speech disorders. Firstly, the characteristics of sign language and the features of facial expression are obtained by a deep neural network (DNN) model. Secondly, a support vector machine (SVM) are trained to classify the sign language and facial expression for recognizing the text of sign language and emotional tags of facial expression. At the same time, a hidden Markov model-based Mandarin-Tibetan bilingual emotional speech synthesizer is trained by speaker adaptive training with a Mandarin emotional speech corpus. Finally, the Mandarin or Tibetan emotional speech is synthesized from the recognized text of sign language and emotional tags. The objective tests show that the recognition rate for static sign language is 90.7%. The recognition rate of facial expression achieves 94.6% on the extended CohnKanade database (CK+) and 80.3% on the JAFFE database respectively. Subjective evaluation demonstrates that synthesized emotional speech can get 4.0 of the emotional mean opinion score. The pleasure-arousal-dominance (PAD) tree dimensional emotion model is employed to evaluate the PAD values for both facial expression and synthesized emotional speech. Results show that the PAD values of facial expression are close to the PAD values of synthesized emotional speech. This means that the synthesized emotional speech can express the emotions of facial expression.

Full Text