Abstract

Research in affective computing and cognitive science has shown the importance of emotional facial and vocal expressions during human-computer and human-human interactions. But, while models exist to control the display and interactive dynamics of emotional expressions, such as smiles, in embodied agents, these techniques can not be applied to video interactions between humans. In this work, we propose an audiovisual smile transformation algorithm able to manipulate an incoming video stream in real-time to parametrically control the amount of smile seen on the user's face and heard in their voice, while preserving other characteristics such as the user's identity or the timing and content of the interaction. The transformation is composed of separate audio and visual pipelines, both based on a warping technique informed by real-time detection of audio and visual landmarks. Taken together, these two parts constitute a unique audiovisual algorithm which, in addition to providing simultaneous real-time transformations of a real person's face and voice, allows to investigate the integration of both modalities of smiles in real-world social interactions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call