Abstract

This paper reports on a visual speech recognition method that is invariant to translation, rotation and scale. Dynamic features representing the mouth motion is extracted from the video data by using a motion segmentation technique termed as motion history image (MHI). MHI is generated by applying accumulative image differencing technique on the sequence of mouth images. Invariant features are derived from the MHI using feature extraction algorithm that combines Discrete Stationary Wavelet Transform (SWT) and moments. A 2-D SWT at level one is applied to decompose MHI to produce one approximate and three detail sub images. The feature descriptors consist of three moments (geometric moments, Hu moments and Zernike moments) computed from the SWT approximate image. The moments features are normalized to achieve the invariance properties. Artificial neural network (ANN) with back propagation learning algorithm is used to classify the moments features. Initial experiments were conducted to test the sensitivity of the proposed approach to rotation, translation and scale of the mouth images and obtained promising results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.