Abstract

Building an automatic speech recognition (ASR) system for children is a very challenging problem especially when the domain-specific data for training is absent or insufficient. In this paper, we present our efforts towards developing a children’s ASR system in Punjabi which a low-resourced language. To begin with, since speech data from children in the case of the Punjabi language is unavailable, we first created a small speech corpus consisting of data from both adult and child speakers. Next, an ASR system was developed on a mix of adults’ and children’s speech and tested on children’s speech. Due to the differences in acoustic attributes such as formant frequency, pitch, and speaking-rate differences between adults’ and children’s speech, the developed ASR system is observed to result in a highly degraded recognition rate. To reduce the acoustic mismatch, we have explored vocal-tract length normalization (VTLN), explicit pitch, and duration modification. All the three explored approaches are observed to be highly effective. To deal with training data scarcity, the role of prosody-modification-based out-of-domain data augmentation is studied. For that purpose, the pitch and speaking-rate of adults’ speech training set are explicitly changed to render it similar to children’s speech. The original and prosody modified data are then pooled together before learning the acoustic models. Significantly reduced error rates are observed by prosody-modification-based out-of-domain data augmentation. In addition to these, we have also studied the effect of varying the number of senones, the number of hidden nodes, and hidden layers as well as early stopping resulting in 32.1% of Relative Improvement (RI) in comparison to the baseline system with varied senones.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.