Do End-to-End Speech Recognition Models Care About Context?

Lasse Borgholt,Lars Maaløe,Željko Agić,Christian Igel,Anders Søgaard,Jakob D Havtorn

doi:10.21437/interspeech.2020-1750

Do End-to-End Speech Recognition Models Care About Context?

Lasse Borgholt, Lars Maaløe + Show 4 more

Open Access

https://doi.org/10.21437/interspeech.2020-1750

Copy DOI

Publication Date: Oct 25, 2020
Citations: 44	License type: other-oa

#Attention-based Encoder-decoder #Connectionist Temporal Classification Model + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is better suited for learning an implicit language model. We test this hypothesis by measuring temporal context sensitivity and evaluate how the models perform when we constrain the amount of contextual information in the audio input. We find that the AED model is indeed more context sensitive, but that the gap can be closed by adding self-attention to the CTC model. Furthermore, the two models perform similarly when contextual information is constrained. Finally, in contrast to previous research, our results show that the CTC model is highly competitive on WSJ and LibriSpeech without the help of an external language model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.