Abstract

AbstractThis paper describes the impact of spoken language and emotional variation in a multilingual speaker identification (SID) system. The development of speech technology applications in low resource languages (LRL) is challenging due to the unavailability of proper speech corpus. This paper illustrates performance analysis of SID in six Eastern and North Eastern (E&NE) Indian languages and an emotional corpus of six basic emotions. For this purpose, six experimentations are carried out using the collected LRL of E&NE data to build speaker identification models. Speaker-specific acoustic characteristics are extracted from the speech segments in terms of short-term spectral features, i.e., shifted delta cepstral (SDC) and partial correlation (PARCOR) coefficients. Gaussian mixture model (GMM) and support vector machine (SVM)-based models are developed to represent the speaker-specific information captured through the spectral features. Apart from that, to build the modern SID i-vectors, time delay neural networks (TDNN) and recurrent neural network with long short-term memory (LSTM-RNN) have been considered. For the evaluation, equal error rate (EER) has been used as a performance matrix of the SID system. Performances of the developed systems are analyzed with different emotional native and non-native language corpus in terms of speaker identification (SID) accuracy in six different experiments.KeywordsLow resource language (LRL)Speaker identification (SID)Shifted delta cepstral (SDC)Partial correlation (PARCOR) coefficientsi-vectorsLinear discriminant analysis (LDA)Probabilistic linear discriminant analysis (PLDA)Deep neural network (DNN)Time delay neural networks (TDNN)Recurrent neural network (RNN)Long short-term memory (LSTM)

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call