Abstract
We present a framework that improves real-time speech recognition performance using deep neural networks (DNNs) with auxiliary Gaussian mixture models (GMMs). The DNNs and the auxiliary GMMs share the same hidden Markov model (HMM) state inventory. First, online incremental feature-space adaptation is performed using the GMM acoustic model. The speaker-adapted features are used to improve the recognition performance of both GMM and DNN models. Second, the acoustic scores from GMMs and DNN are combined at the state-level during decoding. Experiments on a large vocabulary speech recognition task show that both approaches improve recognition performance consistently and that the gains are mostly additive, resulting in about 5% relative improvement over the competitive DNN baseline in both Portuguese and English systems.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.