Emotional 3D speech visualization from 2D audio visual data

Luis Guillermo,Willy Ugarte,Jose-Maria Rojas

doi:10.1142/s1793962324500028

Abstract

Visual speech is hard to recreate by human hands because animation itself is a time-consuming task: both precision and detail must be considered and match the expectations of the developers, but above all, those of the audience. To solve this problem, some approaches has been designed to help accelerate the animation of characters faces, as procedural animation or speech-lip synchronization, where the most common areas for researching these methods are Computer Vision and Machine Learning. However, in general, these tools can have any of these main problems: difficulty on adapting to another language, subject or animation software, high hardware specifications, or the results can be receipted as robotic. Our work presents a Deep Learning model for automatic expressive facial animation using audio. We extract generic audio features from expressive audio speeches rich in phonemes for nonidiom focus speech processing and emotion recognition. From videos used for training, we extracted the landmarks for frame-speech targeting and have the model learn animation for phonemes pronunciation. We evaluated four variants of our model (two function losses and with emotion conditioning) by using a user perspective survey where the one using a Reconstruction Loss Function with emotion training conditioning got more natural results and score in synchronization with the approval of the majority of interviewees. For perception of naturalness, it obtained a 38.89% of the total votes of approval and for language synchronization obtained the highest average score with 65.55% (98.33 of a 150 total points) for English, German and Korean languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Emotional 3D speech visualization from 2D audio visual data

Abstract

Talk to us

Similar Papers

More From: International Journal of Modeling, Simulation, and Scientific Computing

Lead the way for us

Similar Papers

In-depth investigation of speech emotion recognition studies from past to present –The importance of emotion recognition from speech signal for AI–
Yeşim Ülgen Sönmez ... Asaf Varol
Intelligent Systems with Applications | VOL. 22
Yeşim Ülgen Sönmez, et. al.Yeşim Ülgen Sönmez ... Asaf Varol
11 Mar 2024
Intelligent Systems with Applications | VOL. 22

Children’s recognition of emotion in music and speech
Dianna Vidas ... Nicole L Nelson
Music & Science | VOL. 1
Dianna Vidas, et. al.Dianna Vidas ... Nicole L Nelson
01 Jan 2018
Music & Science | VOL. 1

Time Dependent ARMA for Automatic Recognition of Fear-Type Emotions in Speech
J C Vásquez-Correa ... J F Vargas-Bonilla
-
J C Vásquez-Correa, et. al.J C Vásquez-Correa ... J F Vargas-Bonilla
01 Jan 2015
01 Jan 2015

Speech Signal Imaging and Emotion Recognition Based on Symmetric-Diagonal Matrix Model
Zijun Yang ... Aoran Xi
-
Zijun Yang, et. al.Zijun Yang ... Aoran Xi
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Emotional 3D speech visualization from 2D audio visual data

Abstract

Talk to us

Similar Papers

More From: International Journal of Modeling, Simulation, and Scientific Computing