Abstract
A visual component is considerably supplementary to audio information and the apparent asynchrony between acoustic and visual cues may be effectively represented by an asynchronous articulatory feature. A new approach to speaker identification using an articulatory feature-based audio-visual model based on the dynamic Bayesian network is presented. Considerably satisfactory results were achieved in experiments on the audio-visual bimodal CMU database.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have