Abstract

Recently, context dependent (CD)-deep neural network (DNN)-hidden Markov model (HMM) obtains significant improvements in many automatic speech recognition (ASR) tasks. In the standard training procedure for CD-DNN-HMM, the Gaussian mixture models (GMM) based ASR system has to be firstly built to pre-segment the training data and to define the CD states as the targets for DNN. In this paper, we propose a novel decision tree based state tying procedure, in which, the state embeddings derived from DNN are used and clustered to minimize the sum-of-squared error. Thus, the GMM is not a necessary part to define the targets for CD-DNN. Besides, we introduce a training procedure for CD-DNN-HMM, where, the forward backward algorithm is used for context independent (CI) DNN-HMM training, and the proposed state tying approach is applied to define the CD-DNN targets. Experiments were conducted on a 30-hour Chinese broadcast news speech database and the results demonstrate that the proposed DNN based state tying approach yielded comparable performance to the GMM based one.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.