Abstract
In the last few years, there has been significant work on using temporal features of speech excitation source, viz., Linear Prediction (LP) residual and its analytic or instantaneous phase, group delay method, glottal glow derivative, etc. for speaker recognition task. In this paper, score-level fusion of Teager Energy Operator (TEO) phase with Mel Frequency Cepstral Coefficients (MFCC) features for text-independent Speaker Verification (SV) task has been proposed. Experiments have been performed on SV system based on Gaussian Mixture Model-Universal Background Model (GMM-UBM). Proposed SV system with fusion of TEO phase with MFCC is found to improve the accuracy of SV system by 3.10% over state-of-the-art MFCC features. This score-level fusion of TEO phase with MFCC performs better than MFCCs alone under matched and mismatched conditions during testing in SV system on 2002 NIST Speaker Recognition Evaluation (SRE) database. This indicates that TEO phase contains information that is complementary to the MFCC features.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have