Abstract
This paper proposes a novel low-complexity lip contour model for high-level optic feature extraction in noise-robust audiovisual (AV) automatic speech recognition systems. The model is based on weighted least-squares parabolic fitting of the upper and lower lip contours, does not require the assumption of symmetry across the horizontal axis of the mouth, and is therefore realistic. The proposed model does not depend on the accurate estimation of specific facial points, as do other high-level models. Also, we present a novel low-complexity algorithm for speaker normalization of the optic information stream, which is compatible with the proposed model and does not require parameter training. The use of the proposed model with speaker normalization results in noise robustness improvement in AV isolated-word recognition relative to using the baseline high-level model.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.