Abstract
This work explores the utility of the time-domain signal components, or the Intrinsic Mode Functions (IMFs), of speech signals’, as generated from the data-adaptive filterbank nature of Empirical Mode Decomposition (EMD), in characterizing speakers for the task of text-independent Speaker Verification (SV). A modified version of EMD, denoted as MEMD, which extracts IMFs with lesser mode-mixing, and provides a better representation of the higher frequency spectrum of speech, is also utilized for the SV task. Three different features are extracted over 20 ms frames, from the IMFs of EMD and MEMD. They are, then, tested individually, and in conjunction with the Mel Frequency Cepstral Coefficients (MFCCs), for SV. Two corpora - the NIST SRE 2003 corpus, and the CHAINS corpus - are used for the experiments. The results evaluated on the NIST SRE 2003 database, using the i-vector framework, reveal that the features extracted from the IMFs, in conjunction with the MFCCs, enhances the performance of the SV system. Further, it is observed that only a small set of lower-order IMFs is useful and necessary for characterizing speaker-specific information. The combination of the features with the MFCCs is also found to be useful when short speech utterances of ≤10 s are used for testing. Similarly, the results evaluated on the CHAINS corpus, using the conventional Gaussian Mixture Model (GMM) framework, reveal that the features, in combination with the MFCCs, enhance the performance of the SV system, not only for normal speech, but also for fast and whispered speech. Again, it is observed that only the first few IMFs are needed and useful for achieving such enhanced performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.