Abstract
This paper describes speech signal modeling techniques which are well-suited to high performance and robust isolated word recognition. We present new techniques for incorporating spectral/temporal information as a function of the temporal position within each word. In particular, spectral/temporal parameters are computed using both variable length blocks with a variable spacing between blocks. We tested features computed with these methods using an alphabet recognition task based on the ISOLET database. The hidden Markov model toolkit (HTK) was used to implement the isolated word recognizer with whole word HMM models. The best accuracy achieved for speaker independent alphabet recognition, using 50 features, was 97.9%, which represents a new benchmark for this task. We also tested these methods with deliberate signal degradation using additive Gaussian noise and telephone band limiting and found that the recognition degrades gracefully and to a smaller degree than for control cases based on MFCC coefficients and delta cepstra terms.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
More From: IEEE Transactions on Speech and Audio Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.