Denoising Speech for MFCC Feature Extraction Using Wavelet Transformation in Speech Recognition System

Risanuri Hidayat,Anggun Winursito,Agus Bejo,Sujoko Sumaryono

doi:10.1109/iciteed.2018.8534807

Abstract

Mel frequency cepstral coefficient (MFCC) is a popular feature extraction method for a speech recognition system. However, this method is susceptible to noise even though it generates a high accuracy. The conventional MFCC method has a degraded performance when the input signal has noises. This paper presents the implementation of denoising wavelet on speech input of MFCC feature extraction method. The addition of denoising process using wavelet transformation was expected to improve the MFCC performance on noisy signals. The study used 120 speech data, with 30 data were used as the reference, and the other 90 were used as the testing data. The testing data were mixed with white Gaussian noise and then tested to the speech recognition system that already had the reference data. Parameters used in the wavelet denoising process were soft thresholding with the Minimaxi thresholding rule. Eleven wavelet methods on decomposition level 10 were tested on the denoising process. The classification process used K-nearest neighbor (KNN) method. The Fejer-Korovkin 6 wavelet was the best denoising speech signal method that achieved the highest accuracy on input signals with SNR of 5-15dB. Meanwhile, the Daubechies 5 method had a high accuracy on input signal with SNR of 3 dB. All of the tested denoising methods using wavelet transformation were able to improve the accuracy of the speech recognition system on input signals with SNR of 0-10 dB compared to the system without denoising method.

Full Text