Abstract

Performance of Automatic Speech Recognition (ASR) systems is known to suffer considerable degradation when exposed to Far-Field speech data capture. Consequently, far-field ASR has received considerable attention in recent years. Motivated by our recent work using Curriculum Learning (CL) based strategies to improve Speaker Identification (SID) under noisy and degraded conditions, this study proposes a novel approach to improve far-field ASR using CL based approaches. Specifically, we propose using a CL based approach for training a Bidirectional Long Short Term Memory (BLSTM) based ASR network trained using the Connectionist Temporal Classification (CTC) objective function. We initiate the training with comparatively easier near-field data, and include more diverse (difficult) far-field data progressively in the later stages of training. These proposed approaches are shown to significantly outperform the baseline BLSTM ASR system, and offer relative reductions in WERs of up to +7.3% and +10.1% for the dev and eval sets of the AMI far-field voice capture corpus.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.