Abstract

The children's speech recognition performance under mismatched condition i.e., recognizing on the adults' speech trained models is a challenging task. It is well known that MFCC features contain all the information regarding speech and mismatch factor at the same time. Therefore, in this work, the truncation of MFCC features is explored for children's speech recognition on adults' speech trained models. It has already been noted that cepstral truncation gives improved result but excessive increase in cepstral truncation will cause the loss of the relevant spectral information. Motivated by this, in this work, we explored soft-weighing technique that is heteroscedastic linear discriminant analysis (HLDA) for optimizing losses of cepstral information during truncation. In this paper, an HLDA transformation based technique to reduce mismatch condition is proposed. Further, we have tried to develop a linear relationship between HLDA transformation subspace of MFCC features and the VTLN warp factor values. Finally, a scheme to concatenate VTLN and CMLLR is explored. The proposed approach was found to give improvement in performance by 42.18% and 18.88% in the case of connected digit recognition and continuous speech recognition respectively in comparison to direct cepstral truncation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call