Abstract
In this paper we present a comprehensive study on building and adapting deep neural network based speech recognition systems for automatic closed captioning. We develop the proposed systems by first building base automatic speech recognition (ASR) systems that are not specific to any particular show or station. These models are trained on nearly 6000 hours of broadcast news data using conventional hybrid and more recent attention based end-to-end acoustic models. We then employ various adaptation and data augmentation strategies to further improve the trained base models. We use 535 hours of data from two independent BN sources to study how the base models can be customized. We observe up to 32% relative improvement using the proposed techniques on test sets related to, but independent of the adaptation data. At these low word error rates (WERs), we believe the customized BN ASR systems can be used effectively for automatic closed captioning.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.