Abstract

Recent advancements and efficient training procedures in deep neural networks (DNNs) have significantly outperformed the hidden Markov model-Gaussian mixture model (HMM-GMM). The performance of DNNs can further be improved should it be given better phonetic context information. This is manifested by state specific vectors (SSV) of subspace Gaussian mixture model (SGMM). In this paper, we use the state specific vectors of SGMM as features to provide additional phonetic context information to the DNN framework. The state specific vectors are aligned with each observation vector of the training data to form the state specific vector (SSV) feature set. The combination of linear discriminant analysis (LDA) feature sets and state specific feature sets are then used as input features to train the DNN framework. Relative improvement of up to 4.13% is obtained on Hindi database using DNN trained with a combination of state specific feature sets and LDA feature sets compared to the DNN trained only with LDA feature sets. Since state specific vectors provide extra information about the phonetic context, they show improved results when combined with DNN framework. In this paper, we also investigate the performance of speech recognition on different training data selection strategies. The idea is to implement an approach that maximizes the information content in the training corpus. The experiments in this paper are carried on the training data set having maximum information content.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call