Abstract

This paper describes the combination of two recent machine learning techniques for acoustic modeling in speech recognition: deep neural networks (DNNs) and graph-based semi-supervised learning (SSL). While DNNs have been shown to be powerful supervised classifiers and have achieved considerable success in speech recognition, graph-based SSL can exploit valuable complementary information derived from the manifold structure of the unlabeled test data. Previous work on graph-based SSL in acoustic modeling has been limited to frame-level classification tasks and has not been compared to, or integrated with, state-of-the-art DNN/HMM recognizers. This paper represents the first integration of graph-based SSL with DNN based speech recognition and analyzes its effect on word recognition performance. The approach is evaluated on two small vocabulary speech recognition tasks and shows a significant improvement in HMM state classification accuracy as well as a consistent reduction in word error rate over a state-of-the-art DNN/HMM baseline.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call