Boost Transformer with BERT and copying mechanism for ASR error correction

Wenkun Li,Lina Wang,Jing Lu,Hui Di,Kazushige Ouchi

doi:10.1109/ijcnn52387.2021.9533504

Abstract

The accuracy of Automatic Speech Recognition (ASR) is critical to speech-based products, such as subtitling, speech translation, spoken dialogue. We aim to enhance ASR accuracy by correcting errors in ASR hypothesis with sequence-to-sequence (seq2seq) models. In this paper, we propose to boost Transformer-based ASR error correction by fusing the pre-trained BERT [1] in encoder and copying mechanism in decoder, which exploits externally well-learned token representation and copying correct tokens in ASR transcript respectively. In addition, we leverage Text-to-Speech (TTS) synthesized data and ASR 5-best hypotheses to augment the training data and make the data more diverse. We evaluate our approach on two internal test sets and two public ASR test sets. Experimental results show that the proposed approach decreases the average Character Error Rate (CER) from 9.36% to 7.30% compared with ASR hypothesis without correction, and outperforms Transformer-based model by a large margin.

Full Text