Abstract

This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

Highlights

  • Audio-visual Speech Recognition Using Lip Information Extracted from

  • EURASIP Journal on Audio, Speech, and Music Processing, Vol 2007, No , pp.

  • Powered by T2R2 (Tokyo Institute Research Repository)

Read more

Summary

Introduction

Audio-visual Speech Recognition Using Lip Information Extracted from

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call