Implementation and Evaluation of a Modified Mel-Frequency Cepstral Coefficients based Text Independent Automatic Speaker Recognition System

Amit Kumar Singh,Amit Rathi,Rohit Singh,Mangal Das,Ashutosh Dwivedi

doi:10.1109/iihc55949.2022.10060563

Abstract

Speech feature extraction, being the most important step, plays the utmost role in any automatic speech recognition system. In any speech feature extraction technique, the emphasis is on getting more and more accurate and robust features. After decades of research in automatic speaker recognition, there have been several new algorithms and advances, however, there still remains a number of challenges largely due to inconsistencies in speaker's vocal tract over a period of time and health, changing ambience and variations in the performance of speech recording systems etc. Mel frequency cepstral coefficients (MFCC) is an extensively used method in automatic speaker recognition systems. The triangular mel weighing function is a key component of the MFCC feature extraction technique. In this paper various existing modifications in the mel weighing function have been presented and sinc function based mel weighing function has been proposed. The traditional triangular weighing function based as well as Gaussian weighing function based MFCC technique have been evaluated on the TSP speech database. This database contains 11 male and 12 female speakers having utterance of varying lengths. A distortion measure between extracted features established on the least Euclidean distance was utilized for speaker recognition. The success rate of this speaker recognition investigation was estimated to be 92 % in case of traditional triangular mel weighing function and was found to be 96 % in case of Gaussian mel weighing function for the speech segments of 2 seconds.

Full Text