
Success of any commercial Automatic Speech Recognition (ASR) system depends upon availability of its training data. Although, it's performance gets degraded due to absence of enough signal processing characteristics in less resource language corpora. Development of Punjabi Children speech system is one such challenge where zero resource conditions and variabilities in children speech occurs due to speaking speed and vocal tract length than that of adult speech. In this paper, efforts have been made to build Punjabi Children ASR system under mismatched conditions using noise robust approaches like Mel Frequency Cepstral Coefficient (MFCC) or Gammatone Frequency Cepstral Coefficient (GFCC). Consequently, acoustic and phonetic variations among adult and children speech are handled using gender based in-domain training data augmentation and later acoustic variability among speakers in training and testing sets are normalised using Vocal Tract Length Normalization (VTLN). We demonstrate that inclusion of pitch features with test normalized children dataset has significantly enhanced system performance over different environment conditions i.e clean or noisy. The experimental results show a relative improvement of 30.94% using adult female voice pooled with limited children speech over adult male corpus on noise based training data augmentation respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.