Abstract

Nowadays, audio–visual automatic speech recognition (AV-ASR) is an emerging field of research, but there is still lack of proper visual features for visual speech recognition. Visual features are mainly categorized into shape based and appearance based. Based on the different information embedded in shape and appearance features, this paper proposes a new set of hybrid visual features which lead to a better visual speech recognition system. Pseudo-Zernike Moment (PZM) is calculated for shape-based visual feature while Local Bnary Pattern-three orthogonal planes (LBP-TOP) and Discrete Cosine Transform (DCT) are calculated for the appearance-based feature. Moreover, our proposed method also gathers global and local visual information. Thus, the objective of the proposed system is to embed all this visual information into a compact features set. Here, for audio speech recognition, the proposed system uses Mel-frequency cepstral coefficients (MFCC). We also propose a hybrid classification method to carry out all the experiments of AV-ASR. Artificial Neural Network (ANN), multiclass Support Vector Machine (SVM) and Naive Bayes (NB) classifiers are used for classifier hybridization. It is shown that the proposed AV-ASR system with a hybrid classifier significantly improves the recognition rate.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call