Abstract

Audio-visual speech recognition (AVSR) becomes a research trend recent years due to the stimulation of the restrictions rise from the automatic speech recognition (ASR). With the aid of visual signal, AVSR outperforms ASR under certain undesired circumstances such as noisy environments. The key element for a good performed AVSR is the capability of front end lips detection. Instead of getting through the conventional face detection process before lips detection and localization, this paper presents a direct lips detection technique using colour feature clustering without the needs of pre- face detection. The cubic spline interpolant lips color boundary is used for direct lips detection process. The detected lips are then passed to the Kalman filter-based tracking system to estimate the succeeding appearance of lips. The extracted feature coefficients from visual and audio signals are recognized separately using two independent Hidden Markov Model (HMM) and final AVSR recognition is produced after integration of both system. Simulation results have revealed a good performance of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call