Developing children’s speech recognition system for low resource Punjabi language

Virender Kadyan,Syed Shanawazuddin,Amitoj Singh

doi:10.1016/j.apacoust.2021.108002

Abstract

Building an automatic speech recognition (ASR) system for children is a very challenging problem especially when the domain-specific data for training is absent or insufficient. In this paper, we present our efforts towards developing a children’s ASR system in Punjabi which a low-resourced language. To begin with, since speech data from children in the case of the Punjabi language is unavailable, we first created a small speech corpus consisting of data from both adult and child speakers. Next, an ASR system was developed on a mix of adults’ and children’s speech and tested on children’s speech. Due to the differences in acoustic attributes such as formant frequency, pitch, and speaking-rate differences between adults’ and children’s speech, the developed ASR system is observed to result in a highly degraded recognition rate. To reduce the acoustic mismatch, we have explored vocal-tract length normalization (VTLN), explicit pitch, and duration modification. All the three explored approaches are observed to be highly effective. To deal with training data scarcity, the role of prosody-modification-based out-of-domain data augmentation is studied. For that purpose, the pitch and speaking-rate of adults’ speech training set are explicitly changed to render it similar to children’s speech. The original and prosody modified data are then pooled together before learning the acoustic models. Significantly reduced error rates are observed by prosody-modification-based out-of-domain data augmentation. In addition to these, we have also studied the effect of varying the number of senones, the number of hidden nodes, and hidden layers as well as early stopping resulting in 32.1% of Relative Improvement (RI) in comparison to the baseline system with varied senones.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Developing children’s speech recognition system for low resource Punjabi language

Abstract

Talk to us

Similar Papers

More From: Applied Acoustics

Lead the way for us

Journal: Applied Acoustics	Publication Date: Mar 22, 2021
Citations: 11

Similar Papers

Explicit Pitch Mapping for Improved Children’s Speech Recognition
Hemant Kumar Kathania ... S Shahnawazuddin
Circuits, Systems, and Signal Processing | VOL. 37
Hemant Kumar Kathania, et. al.Hemant Kumar Kathania ... S Shahnawazuddin
11 Sep 2017
Circuits, Systems, and Signal Processing | VOL. 37

Developing children's ASR system under low-resource conditions using end-to-end architecture
Ankita ... S Shahnawazuddin
Digital Signal Processing | VOL. 146
Ankita, et. al. Ankita ... S Shahnawazuddin
08 Jan 2024
Digital Signal Processing | VOL. 146

An Investigation of Multilingual TDNN-BLSTM Acoustic Modeling for Hindi Speech Recognition
Ankit Kumar ... Rajesh Kumar Aggarwal
International Journal of Sensors, Wireless Communications and Control | VOL. 12
Ankit Kumar, et. al.Ankit Kumar ... Rajesh Kumar Aggarwal
01 Jan 2021
International Journal of Sensors, Wireless Communications and Control | VOL. 12

Developing speaker independent ASR system using limited data through prosody modification based on fuzzy classification of spectral bins
S Shahnawazuddin ... Hemant K Kathania
Digital Signal Processing | VOL. 93
S Shahnawazuddin, et. al.S Shahnawazuddin ... Hemant K Kathania
11 Jul 2019
Digital Signal Processing | VOL. 93

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Developing children’s speech recognition system for low resource Punjabi language

Abstract

Talk to us

Similar Papers

More From: Applied Acoustics