Abstract
We propose to use sparse inverse covariance matrices for acoustic model training when there is insufficient training data. Acoustic models trained with inadequate training data tend to over fit, generalizing poorly to unseen test data, especially when full covariance matrices are used. We address this problem by adding an L1 regularization term to the traditional objective function for maximum likelihood estimation, to penalize complex models. The structure of the inverse covariance matrices will be automatically sparsified using this new objective function. The Expectation Maximization algorithm is used to learn the parameters of the hidden Markov model using the new objective function. It is shown that the training procedures for all the hidden Markov model parameters are the same as that of maximum likelihood estimation except the inverse covariance matrices. The update equation for the inverse covariance matrices is concave and can be solved efficiently. Our experiments show that this proposed method can correctly learn the underlying correlations among the random variables of the speech feature vector. Experimental results on the Wall Street Journal data show that our proposed model significantly outperforms the diagonal covariance model and the full covariance model by 10.9% and 16.5% relative recognition accuracy, when only about 14 hours of training data are available. On our collected low resource language data-the Cantonese data set, the proposed model also significantly outperforms the diagonal covariance model and the full covariance model.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have