Abstract

The Hidden Markov Model (HMM) is a widely used method for speaker recognition. During its training, the composite order of the measurement probability matrix and the number of re-evaluations of the initial model affect the speed and accuracy of a recognition system. However, theoretical analysis and related quantitative methods are rarely used for adaptively acquiring them. In this paper, a quantitative method for adaptively selecting the optimal composite order and the optimal number of re-evaluations is proposed based on theoretical analysis and experimental results. First, the standard deviation (SD) is introduced to calculate the recognition rate considering its relationship with Mel frequency cepstrum coefficients (MFCCs) dimension, then the composite order is optimized according to its relationship curve with the SD. Second, the composited measurement probability with different number of re-evaluations is calculated and the number of re-evaluations is optimized when a convergence condition is satisfied. Experiments show that the recognition rate with the optimal composite order obtained in this paper is 97.02%, and the recognition rate with the optimal number of re-evaluations is 98.9%.

Highlights

  • Speaker recognition refers to identifying a speaker’s identity using characteristic parameters extracted from the speaker’s speech signal [1]

  • The speaker recognition accuracy of the system based on Hidden Markov Model (HMM) is calculated by increasing M, and the result is shown in Fig.1, where we can see that when the composite order M is gradually increased from 2 to 32, the recognition rate of the system rapidly increases from 80.36% to 100%; as M continues to increase to 128, the recognition rate gradually decreases to 97%

  • When N = 2 and N = 3, the two curves vary in a similar pattern: As the number of re-evaluations increases, the recognition rate of the system gradually increases from approximately 70%, and stabilizes between 95% and 97%. (3)

Read more

Summary

INTRODUCTION

Speaker recognition refers to identifying a speaker’s identity using characteristic parameters extracted from the speaker’s speech signal [1]. Certain important initial parameters, including the composite order of the observed probability density matrix and the number of re-evaluations of the initial model, still require to be manually set by a user when training an HMM for speaker recognition. This reduces the adaptive ability of the speaker recognition system, and affects the recognition accuracy. This paper proposes a quantization method for adaptive acquisition of the Gaussian composite order and the number of re-evaluations through theoretical analysis and experimental verification to improve the accuracy and the training speed of an HMM-based speaker recognition system.

DEFINITION OF HMM
SPEECH FEATURE EXTRACTION
MODEL TRAINING AND IMPORTANT PARAMETERS IN SPEAKER RECOGNITION
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call