Abstract

The performance of an Automatic Speech Recognition System (ASR) system deteriorates while using it on children speech, due to large variations and mismatch of acoustic and linguistic variables between spoken utterances of adults and children. Another important reason for the low efficiency of ASR models is the data scarcity of children speech data for low resource-language like Punjabi. The proposed work in this paper tries to address the both challenges i.e. acoustic and linguistic variations challenge, and data scarcity problem, thereby improves performance of a children speech ASR system for Punjabi language. To handle the first issue of acoustic and linguistic variations, the proposed work uses formant modification as a spectral warping technique to reduce the variation between children speech and adult speech. Further, to address the second issue of data scarcity, this paper discusses training of ASR models on augmented children speech data. Also, the work combines well established Mel-Frequency Cepstral Coefficients (MFCC) features extraction technique with Frequency Domain Linear Prediction (FDLP) to propose MFCC-FDLP hybrid approach for front end feature extraction. For implementing the data augmentation, Tacotron 2, an end-to-end Text to Speech (TTS) generative model has been used. The proposed work uses MFCC, FDLP and hybrid MFCC + FDLP techniques for front end feature extraction, Time Delay Neural Network (TDNN) for backend acoustic modeling, and trigram language model to implement continuous Punjabi language ASR systems. To increase robustness of the proposed ASR system, we have included a batch of lexically diverse words in our pronunciation model which achieved a relative improvement of 29.44%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.