Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition

Xianrui Zheng,Chao Zhang,Philip C Woodland

doi:10.1109/asru51503.2021.9688232

Abstract

Language models (LMs) pre-trained on massive amounts of text, in particular bidirectional encoder representations from Transformers (BERT), generative pre-training (GPT), and GPT-2, have become a key technology for many natural language processing tasks. In this paper, we present results using fine-tuned GPT, GPT-2, and their combination for automatic speech recognition (ASR). Unlike unidirectional LM GPT and GPT-2, BERT is bidirectional whose direct product of the output probabilities is no longer a valid language prior probability. A conversion method is proposed to compute the correct language prior probability based on bidirectional LM outputs in a mathematically exact way. Experimental results on the widely used AMI and Switchboard ASR tasks showed that the combination of the fine-tuned GPT and GPT-2 outperformed the combination of three neural LMs with different architectures trained from scratch on the in-domain text by up to a 12% relative word error rate reduction (WERR). Furthermore, on the AMI corpus, the proposed conversion for language prior probabilities enables BERT to obtain an extra 3% relative WERR, and the combination of BERT, GPT and GPT-2 results in further improvements.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Bidirectional encoders to state-of-the-art: a review of BERT and its transformative impact on natural language processing
Rajesh Gupta
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3
Rajesh GuptaRajesh Gupta
02 Mar 2024
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3

T-BERT:臺灣語言模型–以臺灣在地語言預訓練BERT模型

-

01 Jan 2020
01 Jan 2020

Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Tilahun Yeshambel ... Josiane Mothe
Information | VOL. 14
Tilahun Yeshambel, et. al.Tilahun Yeshambel ... Josiane Mothe
20 Mar 2023
Information | VOL. 14

Oversampling effect in pretraining for bidirectional encoder representations from transformers (BERT) to localize medical BERT and enhance biomedical BERT
Shoya Wada ... Yasushi Matsumura
Artificial Intelligence In Medicine | VOL. 153
Shoya Wada, et. al.Shoya Wada ... Yasushi Matsumura
05 May 2024
Artificial Intelligence In Medicine | VOL. 153

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition

Abstract

Talk to us

Similar Papers