Abstract
End-to-end (E2E) Automatic Speech Recognition (ASR) systems are widely applied in various devices and communication domains. However, state-of-the-art ASR systems are known to underperform when there is a mismatch in the training and test domains. As a result, acoustic models deployed in production are often adapted to the target domain to improve accuracy. This paper proposes a method to perform unsupervised model adaptation for E2E ASR using first-pass transcriptions of adaptation data produced by the baseline ASR model itself. The paper proposes two transcription confidence measures that can be used to select an optimal in-domain adaptation set. Experiments were performed using the Quartznet ASR architecture on the HarperValleyBank corpus. Results show that the unsupervised adaptation technique with the confidence measure based data selection results in a 8% absolute reduction in word error rate on the HarperValleyBank test set. The proposed method can be applied to any E2E ASR system and is suitable for model adaptation on call center audio with little to no manual transcription.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.