Abstract
Speaker identification systems perform almost ideally in neutral talking environments. However, these systems perform poorly in stressful talking environments. In this paper, we present an effective approach for enhancing the performance of speaker identification in stressful talking environments based on a novel radial basis function neural network-convolutional neural network (RBFNN-CNN) model. In this research, we applied our approach to two distinct speech databases: a local Arabic Emirati-accent dataset and a global English Speech Under Simulated and Actual Stress (SUSAS) corpus. To the best of our knowledge, this is the first work that addresses the use of an RBFNN-CNN model in speaker identification under stressful talking environments. Our speech identification models select the finest speech signal representation through the use of Mel-frequency cepstral coefficients (MFCCs) as a feature extraction method. A comparison among traditional classifiers such as support vector machine (SVM), multilayer perceptron (MLP), k-nearest neighbors algorithm (KNN) and deep learning models, such as convolutional neural network (CNN) and recurrent neural network (RNN), was conducted. The results of our experiments show that speaker identification performance in stressful environments based on the RBFNN-CNN model is higher than that with the classical and deep machine learning models.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.