Design, implementation and evaluation of the Czech realistic audio-visual speech synthesis

Miloš Železný,Zdeněk Krňoul,Petr Císař,Jindřich Matoušek

doi:10.1016/j.sigpro.2006.02.039

Abstract

This paper presents the whole process of creation of audio-visual speech synthesis system. Such system consists of two main parts, the acoustic synthesis emulating human speech and the facial animation emulating the human lip articulation. The acoustic subsystem is based on concatenation-based speech synthesis. The visual subsystem is designed as a realistic, fully three-dimensional parametrically controllable facial animation model. To be able to parametrically control the animation to emulate human articulation, the set of visual parameters has to be obtained for all basic speech units. To provide realistic animation, the database of lip movements of a real person need to be recorded and expressed by suitable parameterization. The set of control parameters for visual animation is then derived from this database. The 3D model of a head based on a head of a real person also makes the animation more realistic. To obtain such model, a 3D scanning of a real person has to be adopted. We present the design and implementation of above-mentioned process. The aim is to obtain realistic audio-visual speech synthesis with possibility to change the 3D head model according to particular person. The design, acquisition and processing of audio-visual speech corpus for such purpose is presented. Next, the process of both acoustic and visual speech synthesis is described. The visual speech synthesis comprises the tasks of model training, animation control, and co-articulation modelling. A facial animation can also increase intelligibility of a telephone speech to people with hearing disabilities. In such case the textual information to control the animation is not available. Solution to the problem of mapping visual parameters from speech signal either directly or through recognized text is presented. Furthermore, the 3D scanning algorithm is presented. It allows to obtain realistic 3D model based on a head of a real person and thus to personalize the talking head. In the end of this paper, evaluation of intelligibility of the presented audio-visual speech synthesis and its possible applications are presented.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Design, implementation and evaluation of the Czech realistic audio-visual speech synthesis

Abstract

Talk to us

Similar Papers

More From: Signal Processing

Lead the way for us

Journal: Signal Processing	Publication Date: May 24, 2006
Citations: 24

Similar Papers

Chapter 13 - Multimodal HCI Output: Facial Motion, Gestures and Synthesised Speech Synchronisation
Igor S Pandžić
Multimodal Signal Processing | VOL. -
Igor S PandžićIgor S Pandžić
01 Jan 2009
Multimodal Signal Processing | VOL. -

Towards a high quality Finnish talking head
J.-L Olives ... M Vainio
-
J.-L Olives, et. al.J.-L Olives ... M Vainio
01 Jan 1998
01 Jan 1998

Audiovisual speech synthesis: An overview of the state-of-the-art
Wesley Mattheyses ... Werner Verhelst
Speech Communication | VOL. 66
Wesley Mattheyses, et. al.Wesley Mattheyses ... Werner Verhelst
11 Nov 2014
Speech Communication | VOL. 66

Emotional Audio-Visual Speech Synthesis Based on PAD
Jia Jia ... Shen Zhang
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 19
Jia Jia, et. al.Jia Jia ... Shen Zhang
01 Mar 2011
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Design, implementation and evaluation of the Czech realistic audio-visual speech synthesis

Abstract

Talk to us

Similar Papers

More From: Signal Processing