Abstract

The purpose of speaker recognition technology is to identify the identity of the speaker through the speaker's speech. Speaker recognition technology has been studied for many years. An improved convolutional recurrent neural network is proposed to realize the speaker recognition technology in this article. CNN-LSTM consists with convolutional neural network and recurrent neural network. After the original speech is processed into a grayscale spectrogram, the features are extracted by the optimized CNN structure. The output of the LSTM will be input into the two fully connected layer. After the fully connected layer, the output is sorted. When speech is trained through the CNN-LSTM network, a model with high recognition accuracy will be obtained. Other speech is input into the trained model. If the output meets the Established accuracy, the identity of the speaker is identified. CNN-LSTM has better recognition accuracy than CNN-DNN structure. The recognition rate has increased by about 4%. Converting a speech signal into a spectrogram is easy to implement text-independent speaker recognition technology. We added L2 regularization to the final classification layer in CNN-LSTM. After each layer of the network, we added the nomalization layer. At the same time, the Adam optimizer and the GaussianNoise layer are added.. The accuracy of the original model increases by 80% to 92% with the combination of these four methods. This improved network makes it easy to implement text-independent speaker recognition techniques than traditional identification method. It's superior to unmodified CNN structure. A satisfactory recognition rate can be achieved without using an overly complex neural network model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call