Abstract

The proposed research work clearly investigates the effective use of two modalities (audio and visual inputs) toward designing functional audio-visual speech recognition system. The promising results presented in this piece of work were obtained on vVISWa (visual Vocabulary of Isolated Standard Words) dataset of audio-visual words and CUAVE (Clemson University Audio-Visual Experiments) database, respectively. The discrete cosine transform (DCT), local binary pattern (LBP) features of full frontal visual profile and MFCC features for acoustics signals were fused together for recognition purpose and were classified using random forest classifier.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call