Combining Source and Vocal Tract Information for Text Dependent Speaker Verification

Ramesh K Bhukya

doi:10.1109/cict56698.2022.9997902

Abstract

In the recent trend, text-dependent speaker verification (TDSV) system becomes a challenging task for practical implementation and deployment of the speech biometric based online attendance system as an application. The collected speech utterances contain the environmental conditions and background noise, due to this the system performance drops drastically. To improve the system performance, accurate begin and end points detection needs to be done using Energy based end point detection (EPD) method. Then focuses on considering a recently proposed discrete cosine transform of the integrated linear prediction residual (DCTILPR) excitation source feature along with a vocal tract Mel-Frequency Cepstral Coefficients (MFCC) features. Two parallel systems are developed using both the DCTILPR and MFCCs features for the stated Dynamic Time Warping (DTW) based speaker verification system used for experimental analysis conducted on the IITG database for recording the attendance over a six month period is shown. The widely used MFCC features are considered as the baseline system in the TDSV system. The DCTILPR and MFCC features individual performance, and their score level combination shows a significant improvement over the baseline results as compared to the standalone MFCC features.

Full Text