Abstract

In this work, we have explored excitation source features in addition to vocal tract system features to improve the performance of phone recognition systems (PRSs). The excitation source information is derived by processing linear prediction residual of the speech signal. The vocal tract information is captured using Mel-frequency cepstral coefficient features. The PRSs are developed using hidden Markov models. The robustness of proposed excitation source features is demonstrated using white and babble noisy speech samples. In this work, TIMIT and Bengali speech databases are used for developing PRSs. The tandem PRSs are developed using the phone posteriors obtained from feedforward neural networks. From the results, it is observed that the tandem PRSs developed using the combination of excitation source and vocal tract system features, outperform the conventional tandem systems developed using system features alone. It is also observed that the PRSs developed using the combination of excitation source and vocal tract features, are more robust to noise than the PRSs developed using vocal tract features alone.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call