Abstract

A novel scheme for disambiguating conflicting classification results in audio-visual speech recognition (AVSR) applications is proposed in this chapter. The strategy can be implemented with generative and discriminative models. It can be employed with different kinds of input information, viz., audio, visual, or audio-visual information, indistinctly. The proposed training procedure, introduces the concept of complementary models. A complementary model to a particular class j refers to a model that is trained with instances of all classes, except the ones associated with class j. The main idea is to detect the absence of a class using the complementary models, that is, given a particular instance of class i, detect which complementary model has not been trained with instances of class i. These complementary models are combined with traditional models in a cascade scheme to improve the recognition rates obtained with traditional models. The performance of the proposed recognition system is evaluated on three publicly available audio-visual datasets, and using a generative model, namely a hidden Markov model, and three discriminative techniques, viz. random forests, support vector machines, and adaptive boosting. The experimental results are promising in the sense that for the three datasets, the different models, and the different input modalities, improvements in the recognition rates are achieved in comparison to other methods reported using the same datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call