Abstract

Acoustic models based on Deep Neural Networks (DNNs) lead to significant improvement in the recognition accuracy. In these methods, Hidden Markov Models (HMMs) state scores are computed using flexible discriminant DNNs. On the other hand, Conditional Random Fields (CRFs) are undirected graphical models that maintain the Markov properties of HMMs formulated using the maximum entropy (MaxEnt) principle. CRFs have limited ability to model spectral phenomena since they have single quadratic activation function per state. It is possible and natural to use DNNs to compute the state scores in CRFs. These acoustic models are known as Deep Conditional Random Fields (DCRFs). In this work, a variant of DCRFs is presented and connections with hybrid DNN/HMM systems are established. Under certain assumptions, both DCRFs and hybrid DNN/HMM systems can lead to exact same results for a phone recognition task. In addition, linear activation functions are used in the DCRFs output layer. Consequently, DCRFs and traditional DNN/HMM systems have the same decoding speed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call