Visual Speech Recognition using Fusion of Motion and Geometric Features

Radha N,Shahina A,Nayeemulla Khan A

doi:10.1016/j.procs.2020.04.100

Radha N, Shahina A + Show 1 more

Open Access

https://doi.org/10.1016/j.procs.2020.04.100

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Abstract The Visual Speech Recognition (VSR) system performance is highly influenced by the selection of visual features. These features are categorized into static and dynamic features. This work proposes to exploit both lip shape (static-geometric features) as well as the temporal sequence of lip movements (dynamic-motion features) to build a combined VSR system with fusion both at feature level and model level. The digit dataset for VSR system is evaluated on the benchmark (using Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT), and Zernike Moments (ZM)) systems. First, the Motion History Image (MHI) is calculated from all visemes from which wavelet and Zernike coefficients are extracted and modeled using a simple GMM L-R HMM. This proposed method shows a significant improvement in performance of 85% for MHI-DWT based features, 74% for MHI-DCT and 80% for MHI-ZM features. Geometric features are extracted using an Active Shape Model (ASM). Two types of fusion, namely feature fusion and model fusion are used. In feature level fusion, the motion features (MHI-DWT, MHI-DCT, and MHI-ZM) with geometric features (ASM) and modeled using GMM L-R HMM. The performance improves for combined features with an accuracy of 96.5% for DWT-ASM, 84% for DCT-ASM, and 93% for ZM-ASM. Model level fusion is performed using a two stream HMM model with stream weight of DWT-ASM, DCT-ASM, and ZM-ASM features. A weighted model level fusion results in further improvement, with an accuracy of 98.2% for DWT-ASM, 85% for DCT-ASM and 94.5% for ZM-ASM. The proposed work result achieves high recognition for VSR systems compared to the benchmark systems (DWT, DCT, and ZM).

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Procedia Computer Science	Publication Date: Jan 1, 2020
Citations: 6	License type: cc-by-nc-nd

R Discovery Prime

Visual Speech Recognition using Fusion of Motion and Geometric Features

Abstract

Published Version

Talk to us

Similar Papers

More From: Procedia Computer Science

Lead the way for us

Similar Papers

An Improved Visual Speech Recognition of Isolated Words using Combined Pixel and Geometric Features
N Radha ... A Shahina
Indian Journal of Science and Technology | VOL. 9
N Radha, et. al.N Radha ... A Shahina
24 Nov 2016
Indian Journal of Science and Technology | VOL. 9

Motion Features for Visual Speech Recognition
Wai Chee Yau ... Hans Weghorn
-
Wai Chee Yau, et. al.Wai Chee Yau ... Hans Weghorn
18 Jan 2011
18 Jan 2011

Appearance and shape-based hybrid visual feature extraction: toward audio–visual automatic speech recognition
Saswati Debnath ... Pinki Roy
Signal, Image and Video Processing | VOL. 15
Saswati Debnath, et. al.Saswati Debnath ... Pinki Roy
11 Jun 2020
Signal, Image and Video Processing | VOL. 15

Motion Features for Visual Speech Recognition
Wai Chee Yau ... Hans Weghorn
-
Wai Chee Yau, et. al.Wai Chee Yau ... Hans Weghorn
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Visual Speech Recognition using Fusion of Motion and Geometric Features

Abstract

Published Version

Talk to us

Similar Papers

More From: Procedia Computer Science