Abstract

An effective and robust speech feature extraction method is presented. Based on the time–frequency multiresolution property of wavelet transform, the input speech signal is decomposed into various frequency channels. For capturing the characteristics of an individual speaker, the linear predictive cepstral coefficients of the approximation channel and entropy value of the detail channel for each decomposition process are calculated. In addition, an adaptive thresholding technique for each lower resolution is also applied to remove the influence of noise interference. Experimental results show that using this mechanism not only effectively reduces the influence of noise interference but also improves the recognition performance. Finally, the proposed method is evaluated on the MAT telephone speech database for text-independent speaker identification using the group vector quantisation identifier. Some popular existing methods are also evaluated for comparison, and the results show that the proposed feature extraction algorithm is more effective and robust than the other existing methods. In addition, the performance of the proposed method is very satisfactory even in a low SNR environment corrupted by Gaussian white noise.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call