Abstract

Deep neural networks (DNNs) have recently been successful in many applications and have become a popular approach for speech recognition. Training a DNN model for speech recognition is computationally expensive due to the model large number of parameters. Pre-training improves DNN modeling. However, DNN learning is challenging if pre-training is inefficient. This paper introduces a new framework for pre-training that utilizes label information in lower layers (layers near input) for better recognition. The proposed pre-training method dynamically inserts discriminative information not only in the last layer but also in other layers. In this algorithm, the lower layers achieve more generative information while the higher layers achieve more discriminative information. In addition, this method uses speaker information by employing the Subspace Gaussian Mixture Model (SGMM), which improves recognition accuracy. Experimental results on TIMIT, MNIST, Switchboard, and English Broadcast News datasets show that this approach significantly outperforms current state-of-the-art methods such as the Deep Belief Network and the Deep Boltzmann Machine. Moreover, the proposed algorithm has minimal memory requirements.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call