Abstract

Vocal Tract Length Normalization (VTLN) is a well known and widely accepted technique in order to minimize inter-speaker variation and it works particularly well in clean environments. This paper deals with the applicability of VTLN in noisy environments. The question here we ask is whether the performance of current state of art Automatic Speech Recognizer (ASR) can reliably improved by the application of VTLN despite a large mismatch between the operating environments (clean and noisy). Our experiments demonstrate that feature based VTLN is able to improve the performance of ASR in clean speech, and by comparison we present the drawbacks of this technique when applied to noisy speech. Therefore, feature based VTLN in noise should be carefully addressed and combined with other dedicated techniques for environment compensation, such as adaptive filtering or energy normalization. We also point out in this paper the reasons why VTLN feature is not so effective for processing noisy speech in compare to clean speech.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.