Abstract

The lip motion detection stands out as relevant visual feature for detecting the active speaker and speech recognition. In this paper, a new approach for lips and visual voice activity detection is proposed. First, the algorithm performs skin segmentation to reduce the search area for lip extraction, and the most likely lip and non-lip regions are detected using a Bayesian approach within the delimited area. Then, the final lip segmentation is obtained by thresholding the calculated probability regions and applying simple morphological operators. Finally, the temporal motion of the lips is explored using Hidden Markov Models (HMMs) to detect the likely occurrence of active speech within a temporal window.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call