Abstract

In speaker recognition, a robust recognition method is essential. This paper proposes a speaker verification method that is based on the time-delay neural network (TDNN) and long short-term memory with recurrent project layer (LSTMP) model for the speaker modeling problem in speaker verification. In this work, we present the application of the fusion of TDNN and LSTMP to the i-vector speaker recognition system that is based on the Gaussian mixture model-universal background model. By using a model that can establish long-term dependencies to create a universal background model that contains a larger amount of speaker information, it is possible to extract more feature parameters, which are speaker dependent, from the speech signal. We conducted experiments with this method on four corpora: two in Chinese and two in English. The equal error rate, minimum detection cost function and detection error tradeoff curve are used as criteria for system performance evaluation. The experimental results show that the TDNN–LSTMP/i-vector speaker recognition method outperforms the baseline system on both Chinese and English corpora and has better robustness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call