Abstract

The paper presents a feature extraction method, named as Normalized Gammachirp Cepstral Coefficients (NGCC) that incorporates the properties of the peripheral auditory system to improve robustness in noisy speech recognition. The proposed method is based on a second order low-pass filter and normalized gammachirp filterbank to emulate the mechanisms performed in the outer/middle ear and cochlea. The speech recognition performance of this method is conducted on the speech signals in real-world noisy environments. Experimental results demonstrate that method outperformed the classical feature extraction methods in terms of speech recognition rate. The used Hidden Markov Models based speech recognition system is employed on the HTK 3.4.1 platform (Hidden Markov Model Toolkit).

Highlights

  • The Automatic Speech Recognition (ASR) system, at its most elementary level, encompasses different methods drawn from research in a wide variety of disciplines and areas such as signal processing, statistical pattern recognition, linguistics and communication theory

  • A static feature vector consisted of 12 coefficients is computed. This vector is combined with energy (E), along with differential coefficients; the 1st order (∆) and the 2nd order (A), to yield a feature vector of 39 coefficients for each feature extraction method (NGCC, Mel-Cepstre and Perceptual Linear Prediction (PLP))

  • The results reported in these tables, showed that the proposed Normalized Gammachirp Cepstral Coefficients (NGCC) feature is more robust than the MelCepstre and PLP feature in all noise conditions

Read more

Summary

Introduction

The Automatic Speech Recognition (ASR) system, at its most elementary level, encompasses different methods drawn from research in a wide variety of disciplines and areas such as signal processing, statistical pattern recognition, linguistics and communication theory. The conventional feature extraction methods are based on classical signal processing techniques as the linear prediction or the filter banks (Perdigao and Sá, 1998) These methods such as Mel-Cepstre (or Mel frequency cepstral coefficients) (Davis and Mermelstein, 1980) and Perceptual Linear Prediction (PLP) (Hermansky, 1990) are most used for speech recognition systems does not perform well in noisy environments, while the human auditory system is able to recognize speech in the presence of noise (Haton et al, 2006). It was designed to generate an asymmetric gammatonelike filter by modulating the carrier-tone term of the gammatone analytic impulse response in frequency (Meddis et al, 2010) This characteristic of gammachirp filter was inspired by the fact the basilar membrane impulse response is frequency modulated (Irino and Patterson, 1997; 2006; Unoki et al, 2006)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.