Abstract

Extracting speaker’s personalized feature parameters is vital for speaker recognition. Only one kind of feature cannot fully reflect the speaker’s personality information. In order to represent the speaker’s identity more comprehensively and improve speaker recognition rate, we propose a speaker recognition method based on the fusion feature of a deep and shallow recombination Gaussian supervector. In this method, the deep bottleneck features are first extracted by Deep Neural Network (DNN), which are used for the input of the Gaussian Mixture Model (GMM) to obtain the deep Gaussian supervector. On the other hand, we input the Mel-Frequency Cepstral Coefficient (MFCC) to GMM directly to extract the traditional Gaussian supervector. Finally, the two categories of features are combined in the form of horizontal dimension augmentation. In addition, when the number of speakers to be recognized increases, in order to prevent the system recognition rate from falling sharply, we introduce the optimization algorithm to find the optimal weight before the feature fusion. The experiment results indicate that the speaker recognition rate based on the feature which is fused directly can reach 98.75%, which is 5% and 0.62% higher than the traditional feature and deep bottleneck feature, respectively. When the number of speakers increases, the fusion feature based on optimized weight coefficients can improve the recognition rate by 0.81%. It is validated that our proposed fusion method can effectively consider the complementarity of the different types of features and improve the speaker recognition rate.

Highlights

  • Over the last two decades, with the rapid development of artificial intelligence, voiceprint, iris, fingerprint, face and other biometrics have been of wide concern [1,2,3].Speech is the most common way to communicate and convey information in people’s daily life

  • In order to take into account the between different hierarchical features, we propose a novel fusion model to form acomplementari new between we propose a novel fusion model to form a ne Gaussian supervector fordifferent speakerhierarchical recognition.features, (3) We propose a speaker recognition system

  • We propose a speaker recognition sy based on optimization weight coefficient, which improves the robustness of the system

Read more

Summary

Introduction

Over the last two decades, with the rapid development of artificial intelligence, voiceprint, iris, fingerprint, face and other biometrics have been of wide concern [1,2,3]. In order to fully express the features of speech signals and take advantage of each model, some studies have proposed different fusion strategies to complete speaker recognition in recent years. We propose a new speaker recognition method based on the fusion of by effe aspects, the speaker’s characteristics can be more comprehensively represented deep and shallow. We propose a new speaker methodobtained based on the fusi from the inputofspeech signal, and Gaussian DNN is used toInobtain the bottleneck features, deep and shallow supervector. This method, the MFCC is firstly obtain which are usedfrom to acquire thespeech deep Gaussian supervector.

Proposed Speaker Recognition System
Recombined Gaussian Supervector
Flow chart of Mel-Frequency extracting Mel-Frequency
Deep Recombined Gaussian Supervector
Deep Neural Network Model
Extraction of Deep Recombined Gaussian Supervector
Deep Network
Classification Based on Fusion Features
Support Vector
Fisher Criterion Selection
Optimization of Feature Weight Coefficient
Database Description and Experiment Setup
The Impact of Deep Network Parameters on the System
The Superiority of Bottleneck Features
Performance of Recognition
Performance of Speaker
Speaker
10. Speaker
Performance
The total speakers are the three of feature are are deep recombination
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.