Combining Frame-Synchronous and Label-Synchronous Systems for Speech Recognition

Qiujia Li,Chao Zhang,Philip C Woodland

doi:10.2139/ssrn.4194372

Abstract

Commonly used automatic speech recognition (ASR) systems can be classified into frame-synchronous and label-synchronous categories, based on whether the speech is decoded on a per-frame or per-label basis. Frame-synchronous systems, such as traditional hidden Markov model systems, can easily incorporate existing knowledge and can support streaming ASR applications. Label-synchronous systems, based on attention-based encoder-decoder models, can jointly learn the acoustic and language information with a single model, which can be regarded as audio-grounded language models. In this paper, we propose rescoring the N-best hypotheses or lattices produced by a first-pass frame-synchronous system with a label-synchronous system in a second-pass. By exploiting the complementary modelling of the different approaches, the combined two-pass systems achieve competitive performance without using any extra speech or text data on two standard ASR tasks. For the 80-hour AMI IHM dataset, the combined system has a 13.7% word error rate (WER) on the evaluation set, which is up to a 29% relative WER reduction over the individual systems. For the 300-hour Switchboard dataset, the WERs of the combined system are 5.7% and 12.1% on Switchboard and CallHome subsets of Hub5'00, and 13.2% and 7.6% on Switchboard Cellular and Fisher subsets of RT03, up to a 33% relative reduction in WER over the individual systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Combining Frame-Synchronous and Label-Synchronous Systems for Speech Recognition

Abstract

Talk to us

Similar Papers

More From: SSRN Electronic Journal

Lead the way for us

Similar Papers

Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring
Qiujia Li ... Philip C Woodland
Speech Communication | VOL. 147
Qiujia Li, et. al.Qiujia Li ... Philip C Woodland
24 Dec 2022
Speech Communication | VOL. 147

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Shrikanth S Narayanan
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Shrikanth S Narayanan
01 Mar 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech
Cong-Thanh Do
-
Cong-Thanh DoCong-Thanh Do
01 May 2019
01 May 2019

Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation
M.A Tuğtekin Turan ... Denis Jouvet
-
M.A Tuğtekin Turan, et. al.M.A Tuğtekin Turan ... Denis Jouvet
25 Oct 2020
25 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Combining Frame-Synchronous and Label-Synchronous Systems for Speech Recognition

Abstract

Talk to us

Similar Papers

More From: SSRN Electronic Journal