Abstract

At present, most speaker recognition algorithms are performed in a clean environment, and the effect is poor in a noisy environment. In order to improve the accuracy of speaker recognition in a noisy environment, a new feature extraction method, wavelet packet & Gammatone (WPGT) based model, is proposed. In this model, the wavelet packet is used to decompose high-frequency and low-frequency signals and the Gammatone filter bank simulates the human auditory system to process non-linear signals so that more complete speaker voice features are extracted, and finally, the convolutional neural network is used to train the features and complete speaker recognition. Based on the open source speech data sets and the noise fusion data sets, the proposed method is compared with the commonly used voiceprint feature extraction methods MFCC and Gammatone. The experimental results show that, in a noisy environment, WPGT has better anti-noise ability than MFCC and Gammatone. Compared with MFCC and Gammatone, the accuracy of WPGT is improved by 10.63% and 16.91%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.