Abstract

A visual component is considerably supplementary to audio information and the apparent asynchrony between acoustic and visual cues may be effectively represented by an asynchronous articulatory feature. A new approach to speaker identification using an articulatory feature-based audio-visual model based on the dynamic Bayesian network is presented. Considerably satisfactory results were achieved in experiments on the audio-visual bimodal CMU database.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call