Abstract

This paper presents a two-step initialization algorithm for training of acoustic models based on deep neural networks. The algorithm is focused on reducing the impact of the non-speech segments on the acoustic model training. The idea of the proposed algorithm is to reduce the percentage of non-speech examples in the training set. Effectiveness evaluation of the algorithm has been carried out on the example of English spontaneous telephone speech recognition (Switchboard). The application of the proposed algorithm has led to 3% relative word error rate reduction, compared with the training initialization by restricted Boltzmann machines. The results presented in the paper can be applied in the development of automatic speech recognition systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call