Abstract

Speaker identification is the process of automatically determining who is speaking from the known speakers by the model. It is crucial in voice-based authentication, forensic investigations, security and surveillance. In recent studies, the combination of convolutional neural network (CNN) and recurrent neural network (RNN) variants performed better than separate models of both. However, only limited studies are conducted in speaker identification using a combination of CNN and RNN variants. In this study, we proposed speaker identification using hybrid two-dimensional CNN (2DCNN) and bidirectional gated recurrent unit (BiGRU) to improve performance. The proposed model integrates the advantage of 2DCNN and BiGRU layers to improve the performance of the model. 2DCNN layers have the advantage of extracting short-term spatial features from input data and it has a limited number of parameters for computation. BiGRU layers have an advantage in extracting long-term temporal dependency between the features in both directions (i.e. backward and forward) and it is efficient in achieving convergence during training. Spectrograms of the speech were used as input in our proposed model because of the rich acoustic features of the speaker. To compare the performance of the proposed model, additional experiments are conducted using the models 2DCNN, CNN-LSTM, CNN-BiLSTM and CNN-GRU. The experiments were conducted on the VoxCeleb1 audio dataset, which consists of 153,516 utterances collected from the 1251 speakers. The accuracy, precision, recall and f1 score of the proposed model are 98.28%, 99.08%, 98.92% and 98.97% respectively. The proposed model was compared with the existing works to show the effectiveness of the proposed model. The experiment results and the comparison with the existing works show that the proposed model has higher performance than both existing works and other models experimented in this study.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call