Synthesizing 3D Trump: Predicting and Visualizing the Relationship Between Text, Speech, and Articulatory Movements

Jun Yu,Chang Wen Chen,Qiang Ling,Changwei Luo

doi:10.1109/taslp.2019.2935843

Abstract

The movements of articulators, such as lips, tongue and teeth, play an important role in increasing the language expression capability by unmasking the information hid in text or speech. Hence, it is necessary to deeply mine and visualize the relationship between text, speech and articulatory movements for understanding language in multi-modality and multi-level. As a case study, given text and audio of President Donald John Trump, this paper synthesizes a high quality 3D animation of him speaking with accurate synchronicity between speech and articulators. First, visual co-articulation is modeled by predicting the mapping from text/speech to articulatory movements. Then, based on a reconstructed 3D head model, physiological characteristics and statistical learning are combined to visualize each phoneme. Finally, the visualization results of consecutive phonemes are fused by visual co-articulation model to generate synchronized articulatory animations. Experiments show that the system can not only produce photo-realistic results in front but also distinguish the visual differences among phonemes from unconstrained views.

Full Text