Abstract

This paper proposes methods of using restricted Boltzmann machines (RBM) to generate the sequence of lip images for visual speech synthesis. The aim of our proposed methods is to alleviate the over-smoothing effect of the conventional hidden Markov model (HMM) based statistical approach for lip synthesis. Two model structures using RBMs to model and generate lip movements are investigated in this paper. First, RBMs are adopted to replace Gaussian distributions as the density functions of HMM states. Second, a deep belief network (DBN) is constructed by stacking up multiple RBMs to model the joint distribution between the lip image of each frame and its corresponding context features. Experimental results show that our proposed methods can improve the quality of generated lip images significantly. The method of using DBN model structure and raw pixel features achieves the best performance in our experiments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call