Speaker identification under noisy conditions using hybrid convolutional neural network and gated recurrent unit

Worku Jifara,Ramasamy Srinivasagan,Ali Alzahrani,Wondimu Lambamo Anito

doi:10.11591/ijai.v13.i1.pp1050-1062

Worku Jifara, Ramasamy Srinivasagan + Show 2 more

Open Access

https://doi.org/10.11591/ijai.v13.i1.pp1050-1062

Copy DOI

Abstract

Speaker identification is biometrics that classifies or identifies a person from other speakers based on speech characteristics. Recently, deep learning models outperformed conventional machine learning models in speaker identification. Spectrograms of the speech have been used as input in deep learning-based speaker identification using clean speech. However, the performance of speaker identification systems gets degraded under noisy conditions. Cochleograms have shown better results than spectrograms in deep learning-based speaker recognition under noisy and mismatched conditions. Moreover, hybrid convolutional neural network (CNN) and recurrent neural network (RNN) variants have shown better performance than CNN or RNN variants in recent studies. However, there is no attempt conducted to use a hybrid CNN and enhanced RNN variants in speaker identification using cochleogram input to enhance the performance under noisy and mismatched conditions. In this study, a speaker identification using hybrid CNN and the gated recurrent unit (GRU) is proposed for noisy conditions using cochleogram input. VoxCeleb1 audio dataset with real-world noises, white Gaussian noises (WGN) and without additive noises were employed for experiments. The experiment results and the comparison with existing works show that the proposed model performs better than other models in this study and existing works.

Full Text