Abstract

This paper investigates the combination of vocal tract length normalisation and speaker adaptation in connected digit recognition. In particular, we focus on performing this task under a continuously varying car noise environment. Continuous supervised speaker and environment adaptation is carried out on the test data according to the Bayesian framework. The paper also evaluates various approaches to implement vocal tract length normalisation. The best performance was obtained when the normalisation was performed during both initial speaker-independent training and testing. It was also noticed that, during testing, speaker specific normalisation produced better results than utterance specific normalisation. Our experimental results on the connected digit database show that the joint approach outperforms the system in which on-line Bayesian speaker adaptation is performed on HMM mean parameters. The performance gain was particularly high with so called outlier speakers for whom adaptation is truly needed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.