Abstract
This paper introduces a combinational feature extraction approach to improve speech recognition systems. The main idea is to simultaneously benefit from some features obtained from nonlinear modeling applied to speech reconstructed phase space (RPS) and typical Mel frequency Cepstral coefficients (MFCCs) which have a proved role in speech recognition field. With an appropriate dimension, the reconstructed phase space of speech signal is assured to be topologically equivalent to the dynamics of the speech production system, and could therefore include information that may be absent in linear analysis approaches. In the first part of this paper the application of Lyapunov Exponents (LE) and Fractal Dimension as two usual chaotic features in speech recognition are tested and then a short discussion is made on the weakness of these features in speech recognition. In the following, a statistical modeling approach based on Gaussian mixture models (GMMs) is applied to speech RPS. A final pruned feature set is obtained by applying an efficient feature selection approach to the combination of the parameters of the GMM model and MFCC-based features. A hidden Markov model-based (HMM) speech recognition system and TIMIT speech database are used to evaluate the performance of the proposed feature set by conducting isolated and continuous speech recognition experiments. In final Continuous Speech Recognition (CSR) Experiments, using tri-phone models, 3.7% absolute phoneme recognition accuracy improvement against using MFCC features alone were obtained.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have