Abstract
HMM acoustic models are typically trained on a single set of cepstral features extracted over the full bandwidth of mel-spaced filterbank energies. In this paper, multi-resolution sub-band transformations of the log energy spectra are introduced based on the conjecture that additional cues for phonetic discrimination may exist in the local spectral correlates not captured by the full-band analysis. In this approach the discriminative contribution from sub-band features is considered to supplement rather than substitute for full-band features. HMMs trained on concatenated multi-resolution cepstral features are investigated, along with models based on linearly combined independent multi-resolution streams, in which the sub-band and full-band streams represent different resolutions of the same signal. For the stream-based models, discriminative training of the linear combination weights to a minimum classification error criteria is also applied. Both the concatenated feature and the independent stream modelling configurations are demonstrated to outperform traditional full-band cepstra for HMM-based acoustic phonetic modelling on the TIMIT database. Experiments on context-independent modelling achieve a best increase on the core test set from an accuracy of 62.3% for full-band models to a 67.5% accuracy for discriminately weighted multi-resolution sub-band modelling. A triphone accuracy of 73.9% achieved on the core test set improves notably on full-band cepstra and compares well with results previously published on this task.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.