Abstract

Connectionist temporal classification (CTC) is a sequence-level loss that has been successfully applied to train recurrent neural network (RNN) models for automatic speech recognition. However, one major weakness of CTC is the conditional independence assumption that makes it difficult for the model to learn label dependencies. In this paper, we propose stimulated CTC, which uses stimulated learning to help CTC models learn label dependencies implicitly by using an auxiliary RNN to generate the appropriate stimuli. This stimuli comes in the form of an additional stimulation loss term which encourages the model to learn said label dependencies. The auxiliary network is only used during training and the inference model has the same structure as a standard CTC model. The proposed stimulated CTC model achieves about 35 % relative character error rate improvements on a synthetic gesture keyboard recognition task and over 30 % relative word error rate improvements on the Librispeech automatic speech recognition tasks over a baseline model trained with CTC only.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.