LIP movement generation using restricted Boltzmann machines for visual speech synthesis

Zheng-Chen Liu,Li-Rong Dai,Zhen-Hua Ling

doi:10.1109/chinasip.2015.7230475

Abstract

This paper proposes methods of using restricted Boltzmann machines (RBM) to generate the sequence of lip images for visual speech synthesis. The aim of our proposed methods is to alleviate the over-smoothing effect of the conventional hidden Markov model (HMM) based statistical approach for lip synthesis. Two model structures using RBMs to model and generate lip movements are investigated in this paper. First, RBMs are adopted to replace Gaussian distributions as the density functions of HMM states. Second, a deep belief network (DBN) is constructed by stacking up multiple RBMs to model the joint distribution between the lip image of each frame and its corresponding context features. Experimental results show that our proposed methods can improve the quality of generated lip images significantly. The method of using DBN model structure and raw pixel features achieves the best performance in our experiments.

Full Text