Decision tree based state tying for speech recognition using DNN derived embeddings

Xiangang Li,Xihong Wu

doi:10.1109/iscslp.2014.6936637

Abstract

Recently, context dependent (CD)-deep neural network (DNN)-hidden Markov model (HMM) obtains significant improvements in many automatic speech recognition (ASR) tasks. In the standard training procedure for CD-DNN-HMM, the Gaussian mixture models (GMM) based ASR system has to be firstly built to pre-segment the training data and to define the CD states as the targets for DNN. In this paper, we propose a novel decision tree based state tying procedure, in which, the state embeddings derived from DNN are used and clustered to minimize the sum-of-squared error. Thus, the GMM is not a necessary part to define the targets for CD-DNN. Besides, we introduce a training procedure for CD-DNN-HMM, where, the forward backward algorithm is used for context independent (CI) DNN-HMM training, and the proposed state tying approach is applied to define the CD-DNN targets. Experiments were conducted on a 30-hour Chinese broadcast news speech database and the results demonstrate that the proposed DNN based state tying approach yielded comparable performance to the GMM based one.

Full Text