Abstract

Overfitting is a commonly met issue in automatic speech recognition and is especially impacting when the amount of training data is limited. In order to address this problem, this article investigates acoustic modeling through Multi-Task Learning, with two speaker-related auxiliary tasks. Multi-Task Learning is a regularization method which aims at improving the network's generalization ability, by training a unique model to solve several different, but related tasks. In this article, two auxiliary tasks are jointly examined. On the one hand, we consider speaker classification as an auxiliary task by training the acoustic model to recognize the speaker, or find the closest one inside the training set. On the other hand, the acoustic model is also trained to extract i-vectors from the standard acoustic features. I-Vectors are efficiently applied in the speaker identification community in order to characterize a speaker and its acoustic environment. The core idea of using these auxiliary tasks is to give the network an additional inter-speaker awareness, and thus, reduce overfitting.We investigate this Multi-Task Learning setup on the TIMIT database, while the acoustic modeling is performed using a Recurrent Neural Network with Long Short-Term Memory cells.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call