Abstract

In this paper, we investigate how to apply graph-based semisupervised learning to acoustic modeling in speech recognition. Graph-based semisupervised learning is a widely used transductive semisupervised learning method in which labeled and unlabeled data are jointly represented as a weighted graph; the resulting graph structure is then used as a constraint during the classification of unlabeled data points. We investigate suitable graph-based learning algorithms for speech data and evaluate two different frameworks for integrating graph-based learning into state-of-the-art, deep neural network DDN-based speech recognition systems. The first framework utilizes graph-based learning in parallel with a DNN classifier within a lattice-rescoring framework, whereas the second framework relies on an embedding of graph neighborhood information into continuous space using an autoencoder. We demonstrate significant improvements in framelevel phonetic classification accuracy and consistent reductions in word error rate on large-vocabulary conversational speech recognition tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.