Speech Recognition Method based on CTC Multilayer Loss

Deyu Luo,Changchun Bao,Maoshen Jia,Xianhong Chen

doi:10.1145/3581807.3581864

Abstract

Due to the conditional independent assumption of a CTC model, a language model is usually added to improve its speech recognition performance. However, adding a language model will increase the complexity and computation cost. Therefore, we proposed a simple and effective speech recognition method based on CTC multilayer loss. Unlike the traditional CTC model which only optimizes the CTC loss of the last layer, in this method, the CTC multilayer loss, which guides the training of the model, is obtained by weighted summation of the CTC losses of different layers. Through optimizing the losses of different layers, the information of different layers of the CTC model can be taken into account, and the information obtained is more comprehensive, so that the model obtained has better recognition performance. With a small amount of code modification, this CTC multilayer loss method can well regulate the training of CTC and improve the performance of speech recognition. Since this method only changes the loss function of the CTC model and does not change the structure of the CTC model and its testing process, the training stage is simple and the testing stage has no extra memory cost and computation cost. We evaluated the method on Aishell-1 dataset using WeNet as the baseline, and it was able to reduce the character error rate (CER) by 7.5% and improve speech recognition performance without adding a language model.

Full Text