Don’t Be So Sure! Boosting ASR Decoding via Confidence Relaxation

Tomer Wullach,Shlomo E Chazan

doi:10.1609/aaai.v37i11.26614

Abstract

Automatic Speech Recognition (ASR) systems frequently use a search-based decoding strategy aiming to find the best attainable transcript by considering multiple candidates. One prominent speech recognition decoding heuristic is beam search, which seeks the transcript with the greatest likelihood computed using the predicted distribution. While showing substantial performance gains in various tasks, beam search loses some of its effectiveness when the predicted probabilities are highly confident, i.e., the predicted distribution is massed for a single or very few classes. We show that recently proposed Self-Supervised Learning (SSL)-based ASR models tend to yield exceptionally confident predictions that may hamper beam search from truly considering a diverse set of candidates. We perform a layer analysis to reveal and visualize how predictions evolve, and propose a decoding procedure that improves the performance of fine-tuned ASR models. Our proposed approach does not require further training beyond the original fine-tuning, nor additional model parameters. In fact, we find that our proposed method requires significantly less inference computation than current approaches. We propose aggregating the top M layers, potentially leveraging useful information encoded in intermediate layers, and relaxing model confidence. We demonstrate the effectiveness of our approach by conducting an empirical study on varying amounts of labeled resources and different model sizes, showing consistent improvements in particular when applied to low-resource scenarios.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Don’t Be So Sure! Boosting ASR Decoding via Confidence Relaxation

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Improving out of vocabulary words recognition accuracy for an end-to-end Russian speech recognition system
A.Yu Andrusenko ... A.N Romanenko
Scientific and Technical Journal of Information Technologies, Mechanics and Optics | VOL. 22
A.Yu Andrusenko, et. al.A.Yu Andrusenko ... A.N Romanenko
01 Dec 2022
Scientific and Technical Journal of Information Technologies, Mechanics and Optics | VOL. 22

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling
G Thimmaraja Yadava ... H S Jayanna
International Journal of Speech Technology | VOL. 23
G Thimmaraja Yadava, et. al.G Thimmaraja Yadava ... H S Jayanna
22 Jan 2020
International Journal of Speech Technology | VOL. 23

Environmental Independent ASR Model Adaptation/Compensation by Bayesian Parametric Representation
Xuechuan Wang ... Douglas O'Shaughnessy
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15
Xuechuan Wang, et. al.Xuechuan Wang ... Douglas O'Shaughnessy
01 May 2007
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15

Useful blunders: Can automated speech recognition errors improve downstream dementia classification?
Changye Li ... Serguei Pakhomov
Journal of Biomedical Informatics | VOL. 150
Changye Li, et. al.Changye Li ... Serguei Pakhomov
20 Jan 2024
Journal of Biomedical Informatics | VOL. 150

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Don’t Be So Sure! Boosting ASR Decoding via Confidence Relaxation

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence