Deep Learning Based Speaker Recognition System with CNN and LSTM Techniques

Noshin Nirvana Prachi,Riasat Khan,Md. Habibullah,Faisal Mahmud Nahiyan

doi:10.1109/irtm54583.2022.9791766

Abstract

Speaker recognition is an advanced method to identify a person from the biometric characteristics of speaking voice samples. Speaker recognition has become a vastly popular and useful research subject with countless essential applications in security, assistance, replication, authentication, automation, and verification. Many techniques are implemented using deep learning and neural network concepts and various datasets for speaker verification and identification. The primary goal of this work is to create improved robust techniques of speaker recognition to identify audio and enhance accuracy to human levels of comprehension. TIMIT and LibriSpeech datasets are used in this paper to develop an efficient automatic speaker recognition system. This work focuses on using MFCC to transform audio to spectrograms without losing the essential features of the audio file in question. We have used a closed set and an open set implementation procedure on these datasets. The closed set implementation uses a standard machine learning convention of utilizing the same datasets for training and testing, leading to higher accuracy. On the other hand, the open set implementation uses one dataset to train and another to test on each occasion. The accuracy, in this case, turned out to be relatively lower. On each dataset, CNN and LSTM deep learning techniques have been used to identify the sound, leading to the observation that implementing CNN resulted in a more significant accuracy.

Full Text