Abstract

In this paper, we investigate how to apply graph-based semisupervised learning to acoustic modeling in speech recognition. Graph-based semisupervised learning is a widely used transductive semisupervised learning method in which labeled and unlabeled data are jointly represented as a weighted graph; the resulting graph structure is then used as a constraint during the classification of unlabeled data points. We investigate suitable graph-based learning algorithms for speech data and evaluate two different frameworks for integrating graph-based learning into state-of-the-art, deep neural network DDN-based speech recognition systems. The first framework utilizes graph-based learning in parallel with a DNN classifier within a lattice-rescoring framework, whereas the second framework relies on an embedding of graph neighborhood information into continuous space using an autoencoder. We demonstrate significant improvements in framelevel phonetic classification accuracy and consistent reductions in word error rate on large-vocabulary conversational speech recognition tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call