Audio-visual continuous speech recognition using MPEG-4 compliant visual features

P.S Aleksic,Zhilin Wu Zhilin Wu,J.J Williams,A.K Katsaggelos

doi:10.1109/icip.2002.1038187

Audio-visual continuous speech recognition using MPEG-4 compliant visual features

P.S Aleksic, Zhilin Wu Zhilin Wu + Show 2 more

https://doi.org/10.1109/icip.2002.1038187

Copy DOI

Publication Date: Dec 10, 2002

Citations: 22

Affiliation: Northwestern University

#Facial Animation Parameters #Word Error Rate + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

We utilize facial animation parameters (FAPs), supported by the MPEG-4 standard for the visual representation of speech, in order to improve automatic speech recognition (ASR) significantly. We describe a robust and automatic algorithm for extraction of FAPs from visual data that requires no hand labeling or extensive training procedures. Multi-stream hidden Markov models (HMM) are used to integrate audio and visual information. ASR experiments are performed under both clean and noisy audio conditions using a relatively large vocabulary (approximately 1000 words). The proposed system reduces the word error rate (WER) by 20% to 23% relative to audio-only ASR WERs, at various SNRs with additive white Gaussian noise, and by 19% relative to the audio-only ASR WER under clean audio conditions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.