Abstract

In speaker recognition systems, feature extraction is a challenging task under environment noise conditions. To improve the robustness of the feature, we proposed a multiscale chaotic feature for speaker recognition. We use a multiresolution analysis technique to capture more finer information on different speakers in the frequency domain. Then, we extracted the speech chaotic characteristics based on the nonlinear dynamic model, which helps to improve the discrimination of features. Finally, we use a GMM-UBM model to develop a speaker recognition system. Our experimental results verified its good performance. Under clean speech and noise speech conditions, the ERR value of our method is reduced by 13.94% and 26.5% compared with the state-of-the-art method, respectively.

Highlights

  • Speaker recognition is a biometric recognition technique, which can identify speaker identity according to speaker personality information on a speech signal

  • A Gaussian mixture model (GMM) is used to identify each speaker; here, we introduce a universal background model (UBM) for training the distribution of features that are not related to the speaker. e GMM-UBM model is used widely for speaker recognition [19,20,21] as a classifier; it is a generalization of the GMM model. e GMM-UBM model firstly performs a pretraining for the current speaker by collecting feature data from other speakers, which can solve the problem of recognition performance declining due to the insufficient feature data of the current train speaker. en, the pretrained model is fine-tuned to the target speaker model by a maximum a posteriori (MAP) adaptive algorithm [22]

  • Take account of the validity of equal error rate (EER) on speaker recognition evaluation, we selected EER as a metric to evaluate the performance of chaotic features

Read more

Summary

Introduction

Speaker recognition is a biometric recognition technique, which can identify speaker identity according to speaker personality information on a speech signal. Under the influence of channel distortion and background noise, the cepstral feature distribution of speech will change arbitrarily, which leads to its weak distinguish ability. Unlike the CMS method, the relative spectrum feature is proposed to compensate for rapidly changing channel distortions, and it uses moving average filtering to simulate the exponential decay of the mean subtraction [10]. This method was later confirmed to have limited improvements in channel mismatch and additive background noise. The method of feature compensation is to improve the distinguishing ability of speech features for reducing the influence of noise on the features. This nonlinear characteristic should be reflected in speech features to speaker recognition. e chaotic feature based on the nonlinear dynamic model is used widely to the speech application system. e nonlinear dynamic model has been used in various fields of speech processing area, such as speech steganalysis [15], speech synthesis [16], speech recognition [17], and speech encryption [18]. e proposed feature represents the signal chaotic characteristic at different frequency bands

Proposed Speaker Recognition System
Multiscale Chaotic Feature
Evaluation and Analysis
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call