Abstract
Training of mixed bandwidth acoustic models have recently been realized by incorporating special Mel filterbanks. To fit information into every filterbank bin available across both narrowband and wideband data, these filterbanks pad zeros at high frequency ranges of narrowband data. Although these methods succeed in decreasing word error rates (WER) on broadband data, they fail to improve on narrowband signals. In this paper, we propose methods to mitigate these effects with generalized knowledge distillation. In our method, specialized teacher networks are first trained on lossless acoustic features with full scale Mel filterbanks. While training student networks, privileged knowledge from these teacher networks is then used to compensate for missing information at high frequencies introduced by the special Mel filterbanks. We show the benefit of the proposed technique for both narrowband (10% relative WER improvement) and wideband data (7.5% relative WER improvement) on the Aurora 4 task over traditional methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.