Abstract

Aiming at the problem of open set voiceprint recognition, this paper proposes an adaptive threshold algorithm based on OTSU and deep learning. The bottleneck technology of open set voiceprint recognition lies in the calculation of similarity values and thresholds of speakers inside and outside the set. This paper combines deep learning and machine learning methods, and uses a Deep Belief Network stacked with three layers of Restricted Boltzmann Machines to extract deep voice features from basic acoustic features. And by training the Gaussian Mixture Model, this paper calculates the similarity value of the feature, and further determines the threshold of the similarity value of the feature through OTSU. After experimental testing, the algorithm in this paper has a false rejection rate of 3.00% for specific speakers, a false acceptance rate of 0.35% for internal speakers, and a false acceptance rate of 0 for external speakers. This improves the accuracy of traditional methods in open set voiceprint recognition. This proves that the method is feasible and good recognition effect.

Highlights

  • Voiceprint recognition is a biometric authentication technology that recognizes the identity based on the human body’s own voice characteristics

  • Aiming at the problem of open set voiceprint recognition, this paper proposes an adaptive threshold algorithm based on OTSU and deep learning

  • One is closed-set voiceprint recognition, that is, all the speaker’s voices already exist in the model library, and the voice features to be tested are matched with all the speakers in the model library, and the one with highest matching degree is the one to be asked; the other is open-set voiceprint recognition, that is, the voice feature to be tested may not be in the trained model library, which requires a threshold to decide whether to accept or reject

Read more

Summary

Introduction

Voiceprint recognition is a biometric authentication technology that recognizes the identity based on the human body’s own voice characteristics. If the training data sample is too small, the point where FRR and FAR are equal may not be obtained, so the threshold is less robust, resulting in reduced system recognition performance. The features used for voiceprint recognition are mainly time-domain feature parameters such as energy or amplitude in the time domain, zero-crossing rate, and transform domain feature parameters obtained by performing certain transformations on the original speech signal after framing the original speech signal. Such as linear prediction coefficient, linear prediction cepstrum coefficient [3], and Mel cepstrum coefficient [4]. The experimental results show that the threshold determined by the algorithm has a good performance in the performance of open set voiceprint recognition

OTSU-Based Approach for Threshold Calculation
DBN-Based Deep Voiceprint Feature Extraction
Open-Set Speaker Recognition Experiment Based on OTSU
OTSU Adaptative Threshold Method Based on DBN-GMM
Experimental Results and Analysis
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call