Abstract
In the last few years, there has been significant work on using temporal features of speech excitation source, viz., Linear Prediction (LP) residual and its analytic or instantaneous phase, group delay method, glottal glow derivative, etc. for speaker recognition task. In this paper, score-level fusion of Teager Energy Operator (TEO) phase with Mel Frequency Cepstral Coefficients (MFCC) features for text-independent Speaker Verification (SV) task has been proposed. Experiments have been performed on SV system based on Gaussian Mixture Model-Universal Background Model (GMM-UBM). Proposed SV system with fusion of TEO phase with MFCC is found to improve the accuracy of SV system by 3.10% over state-of-the-art MFCC features. This score-level fusion of TEO phase with MFCC performs better than MFCCs alone under matched and mismatched conditions during testing in SV system on 2002 NIST Speaker Recognition Evaluation (SRE) database. This indicates that TEO phase contains information that is complementary to the MFCC features.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.