Abstract

This paper proposes a framework for converting sign language to emotional speech by deep learning. We firstly adopt a deep neural network (DNN) model to extract the features of sign language and facial expression. Then we train two support vector machines (SVM) to classify the sign language and facial expression for recognizing the text of sign language and emotional tags of facial expression. We also train a set of DNN-based emotional speech acoustic models by speaker adaptive training with an multi-speaker emotional speech corpus. Finally, we select the DNN-based emotional speech acoustic models with emotion tags to synthesize emotional speech from the text recognized from the sign language. Objective tests show that the recognition rate for static sign language is 90.7%. The recognition rate of facial expression achieves 94.6% on the extended Cohn-Kanade database (CK+) and 80.3% on the Japanese Female Facial Expression (JAFFE) database respectively. Subjective evaluation demonstrates that synthesized emotional speech can get 4.2 of the emotional mean opinion score. The pleasure-arousal-dominance (PAD) tree dimensional emotion model is employed to evaluate the PAD values for both facial expression and synthesized emotional speech. Results show that the PAD values of facial expression are close to the PAD values of synthesized emotional speech. This means that the synthesized emotional speech can express the emotions of facial expression.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call