Abstract
Most automatic speech recognition systems have concentrated exclusively on the acoustic speech signal, and therefore they are susceptible to acoustic noise. The benefits from visual speech cues have motivated significant interest in automatic lip-reading, which aims at improving automatic speech recognition by exploiting informative visual features of a speaker's mouth region, which means speaker lip motion stands out as the most linguistically visual feature. In this paper, we present a new improved robust lip location and tracking approach, aims at improving the lip-reading accuracy. Lip regions of interest are detected by a new method, combining with Intel Open source (OpenCV). In this new method, we analyze the distribution relationship between faces, eyes and mouth, and then the mouth region can be easily located. It can be proved as an effective method for lip tracking. In the subsequent step, color space is transferred to Lab from RGB color space, and a component of Lab color space is used for extracting lip segmentation and tracking lip region more accurately and efficiently from video sequences of a speaker's talking face in different lighting conditions, and with different lip shapes and head poses. Extensive experiments show that our proposed method can achieve superior performance to other similar lip tracking approaches, and then can be effectively integrated in lip-reading or visual speech recognition systems.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have