Abstract

In this work, a Punjabi children speech recognition system is developed under different acoustic matched and mismatched conditions. One major problem in children's speech recognition is the differences in the acoustic attributes of the children and adult speech signals, which leads to the poor recognition rate for the children's speech. This paper shows how pitch enhanced features extracted from the front-end feature extraction process plays an important role under mismatched acoustic conditions. After enhancing the pitch using the Cepstral analysis in the feature extraction process, the recognition rate of the children's speech recognition system using different age group datasets increases as compared to the normal acoustics features extracted using Mel Frequency Cepstral Coefficient (MFCC) feature extraction process. Kaldi toolkit is used for building the children's speech recognition models at different phoneme levels. Results show the improvement of 0.03% to 16.47% WER under different acoustic conditions using pitch enhanced features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call