Abstract
Nowadays, audio–visual automatic speech recognition (AV-ASR) is an emerging field of research, but there is still lack of proper visual features for visual speech recognition. Visual features are mainly categorized into shape based and appearance based. Based on the different information embedded in shape and appearance features, this paper proposes a new set of hybrid visual features which lead to a better visual speech recognition system. Pseudo-Zernike Moment (PZM) is calculated for shape-based visual feature while Local Bnary Pattern-three orthogonal planes (LBP-TOP) and Discrete Cosine Transform (DCT) are calculated for the appearance-based feature. Moreover, our proposed method also gathers global and local visual information. Thus, the objective of the proposed system is to embed all this visual information into a compact features set. Here, for audio speech recognition, the proposed system uses Mel-frequency cepstral coefficients (MFCC). We also propose a hybrid classification method to carry out all the experiments of AV-ASR. Artificial Neural Network (ANN), multiclass Support Vector Machine (SVM) and Naive Bayes (NB) classifiers are used for classifier hybridization. It is shown that the proposed AV-ASR system with a hybrid classifier significantly improves the recognition rate.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.