Adaptive Threshold Estimation of Open Set Voiceprint Recognition Based on OTSU and Deep Learning

Xudong Li,Xinjia Yang,Linhua Zhou

doi:10.4236/jamp.2020.811197

Abstract

Aiming at the problem of open set voiceprint recognition, this paper proposes an adaptive threshold algorithm based on OTSU and deep learning. The bottleneck technology of open set voiceprint recognition lies in the calculation of similarity values and thresholds of speakers inside and outside the set. This paper combines deep learning and machine learning methods, and uses a Deep Belief Network stacked with three layers of Restricted Boltzmann Machines to extract deep voice features from basic acoustic features. And by training the Gaussian Mixture Model, this paper calculates the similarity value of the feature, and further determines the threshold of the similarity value of the feature through OTSU. After experimental testing, the algorithm in this paper has a false rejection rate of 3.00% for specific speakers, a false acceptance rate of 0.35% for internal speakers, and a false acceptance rate of 0 for external speakers. This improves the accuracy of traditional methods in open set voiceprint recognition. This proves that the method is feasible and good recognition effect.

Highlights

Voiceprint recognition is a biometric authentication technology that recognizes the identity based on the human body’s own voice characteristics
Aiming at the problem of open set voiceprint recognition, this paper proposes an adaptive threshold algorithm based on OTSU and deep learning
One is closed-set voiceprint recognition, that is, all the speaker’s voices already exist in the model library, and the voice features to be tested are matched with all the speakers in the model library, and the one with highest matching degree is the one to be asked; the other is open-set voiceprint recognition, that is, the voice feature to be tested may not be in the trained model library, which requires a threshold to decide whether to accept or reject

Summary

Introduction

Voiceprint recognition is a biometric authentication technology that recognizes the identity based on the human body’s own voice characteristics. If the training data sample is too small, the point where FRR and FAR are equal may not be obtained, so the threshold is less robust, resulting in reduced system recognition performance. The features used for voiceprint recognition are mainly time-domain feature parameters such as energy or amplitude in the time domain, zero-crossing rate, and transform domain feature parameters obtained by performing certain transformations on the original speech signal after framing the original speech signal. Such as linear prediction coefficient, linear prediction cepstrum coefficient [3], and Mel cepstrum coefficient [4]. The experimental results show that the threshold determined by the algorithm has a good performance in the performance of open set voiceprint recognition

OTSU-Based Approach for Threshold Calculation

DBN-Based Deep Voiceprint Feature Extraction

Open-Set Speaker Recognition Experiment Based on OTSU

OTSU Adaptative Threshold Method Based on DBN-GMM

Experimental Results and Analysis

Conclusion