Analysis of the Intrinsic Mode Functions for Speaker Information

Rajib Sharma,S.R.M Prasanna,Ramesh K Bhukya,Rohan Kumar Das

doi:10.1016/j.specom.2017.04.006

Abstract

This work explores the utility of the time-domain signal components, or the Intrinsic Mode Functions (IMFs), of speech signals’, as generated from the data-adaptive filterbank nature of Empirical Mode Decomposition (EMD), in characterizing speakers for the task of text-independent Speaker Verification (SV). A modified version of EMD, denoted as MEMD, which extracts IMFs with lesser mode-mixing, and provides a better representation of the higher frequency spectrum of speech, is also utilized for the SV task. Three different features are extracted over 20 ms frames, from the IMFs of EMD and MEMD. They are, then, tested individually, and in conjunction with the Mel Frequency Cepstral Coefficients (MFCCs), for SV. Two corpora - the NIST SRE 2003 corpus, and the CHAINS corpus - are used for the experiments. The results evaluated on the NIST SRE 2003 database, using the i-vector framework, reveal that the features extracted from the IMFs, in conjunction with the MFCCs, enhances the performance of the SV system. Further, it is observed that only a small set of lower-order IMFs is useful and necessary for characterizing speaker-specific information. The combination of the features with the MFCCs is also found to be useful when short speech utterances of ≤10 s are used for testing. Similarly, the results evaluated on the CHAINS corpus, using the conventional Gaussian Mixture Model (GMM) framework, reveal that the features, in combination with the MFCCs, enhance the performance of the SV system, not only for normal speech, but also for fast and whispered speech. Again, it is observed that only the first few IMFs are needed and useful for achieving such enhanced performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Analysis of the Intrinsic Mode Functions for Speaker Information

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Apr 27, 2017
Citations: 17

Similar Papers

Analysis of the Hilbert Spectrum for Text-Dependent Speaker Verification
Rajib Sharma ... S.R.M Prasanna
Speech Communication | VOL. 96
Rajib Sharma, et. al.Rajib Sharma ... S.R.M Prasanna
07 Dec 2017
Speech Communication | VOL. 96

Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection
Maulik C Madhavi ... Hemant A Patil
Computer Speech & Language | VOL. 58
Maulik C Madhavi, et. al.Maulik C Madhavi ... Hemant A Patil
23 Mar 2019
Computer Speech & Language | VOL. 58

Application of the EMD Decomposition to Discriminate Nasalized vs. Vowels Phones in French
M.M Saidi ... O Pietquin
-
M.M Saidi, et. al.M.M Saidi ... O Pietquin
01 Jan 2009
01 Jan 2009

Separation of Surface Roughness Profile from Raw Contour based on Empirical Mode Decomposition
Hui Zhang ... Shoubin Liu
-
Hui Zhang, et. al.Hui Zhang ... Shoubin Liu
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analysis of the Intrinsic Mode Functions for Speaker Information

Abstract

Talk to us

Similar Papers

More From: Speech Communication