Maximum Likelihood Nonlinear Transformations Based on Deep Neural Networks

Xiaodong Cui,Vaibhava Goel

doi:10.1109/taslp.2016.2594255

Abstract

Feature transformations are commonly used in speech recognition to account for distribution mismatches between the source and target domains also referred to as covariate shift. Linear affine or piecewise linear transformations are typically considered. In this paper, we present deep neural network DNN based nonlinear feature transformations estimated under the maximum likelihood criterion. We use the hidden Markov model HMM to model speech feature sequences and features in each HMM state assume a Gaussian mixture model GMM distribution. The network is pre-trained close to a linear transformation followed by a fine-tuning using the gradient descent algorithm. Due to the nonlinearity, the gradients and the partition functions of GMM-HMM state distributions are evaluated using the Monte Carlo MC method based on importance sampling. In addition, a deep stacked architecture is proposed to hierarchically build a DNN as a series of sub-networks with each representing a nonlinear transformation itself, which can be learned using a block-wise learning strategy. Applications of the proposed nonlinear transformations in speaker/environment adaptation and acoustic modeling in large vocabulary continuous speech recognition tasks show its superior performance over the widely-used constrained maximum likelihood linear regression CMLLR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Maximum Likelihood Nonlinear Transformations Based on Deep Neural Networks

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Nov 1, 2016
Citations: 3

Similar Papers

A study on speaker normalized MLP features in LVCSR
Zoltán Tüske ... Ralf Schlüter
-
Zoltán Tüske, et. al.Zoltán Tüske ... Ralf Schlüter
27 Aug 2011
27 Aug 2011

Generalized discriminative feature transformation for speech recognition
Roger Hsiao ... Tanja Schultz
-
Roger Hsiao, et. al.Roger Hsiao ... Tanja Schultz
06 Sep 2009
06 Sep 2009

Missing feature reconstruction and acoustic model adaptation combined for large vocabulary continuous speech recognition
...
-
, et. al. ...
25 Aug 2008
25 Aug 2008

Comparison of Grapheme and Phoneme Based Acoustic Modeling in LVCSR Task in Slovak
Michal Mirilovič ... Anton Čižmár
-
Michal Mirilovič, et. al.Michal Mirilovič ... Anton Čižmár
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Maximum Likelihood Nonlinear Transformations Based on Deep Neural Networks

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing