Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring

Qiujia Li,Chao Zhang,Philip C Woodland

doi:10.1016/j.specom.2022.12.002

Abstract

The traditional hybrid deep neural network (DNN)–hidden Markov model (HMM) system and attention-based encoder–decoder (AED) model are both commonly used automatic speech recognition (ASR) approaches with distinct characteristics and advantages. While hybrid systems are per-frame-based and highly modularised to leverage external phonetic and linguistic knowledge, AED models operate on a per-label basis and jointly learn the acoustic and language information using a single model in an end-to-end trainable fashion. In this paper, we propose combining these two approaches in a two-pass rescoring framework. The first-pass uses hybrid ASR systems to facilitate streaming and controllable ASR, and the second-pass re-scores the N-best hypotheses or lattices produced by the first-pass hybrid DNN-HMM system with AED models. We also propose an improved algorithm for lattice rescoring with AED models. Experiments show the combined two-pass systems achieve competitive performance without using extra speech or text data on two standard ASR tasks. For the 80-hour AMI IHM dataset, the combined system has a 13.7% word error rate (WER) on the evaluation set and is up to a 29% relative WER reduction over the individual systems. For the 300-hour Switchboard dataset, the WERs of the combined system are 5.7% and 12.1% on Switchboard and CallHome subsets of Hub5’00, and 13.2% and 7.6% on Switchboard Cellular and Fisher subsets of RT03, and are up to a 33% relative reduction in WER over the individual systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Speech Communication	Publication Date: Dec 24, 2022
Citations: 5	License type: cc-by

R Discovery Prime

R Discovery Prime

Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Similar Papers

Combining Frame-Synchronous and Label-Synchronous Systems for Speech Recognition
Qiujia Li ... Philip C Woodland
SSRN Electronic Journal | VOL. -
Qiujia Li, et. al.Qiujia Li ... Philip C Woodland
01 Jan 2021
SSRN Electronic Journal | VOL. -

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition
Zhong Meng ... Yashesh Gaur
-
Zhong Meng, et. al.Zhong Meng ... Yashesh Gaur
09 Nov 2020
09 Nov 2020

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Shrikanth S Narayanan
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Shrikanth S Narayanan
01 Mar 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data
Ye Bai ... Jiangyan Yi
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 29
Ye Bai, et. al.Ye Bai ... Jiangyan Yi
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring

Abstract

Talk to us

Similar Papers

More From: Speech Communication