Significance of analytic phase of speech signals in speaker verification

Karthika Vijayan,Pappagari Raghavendra Reddy,K Sri Rama Murty

doi:10.1016/j.specom.2016.02.005

Abstract

The objective of this paper is to establish the importance of phase of analytic signal of speech, referred to as the analytic phase, in human perception of speaker identity, as well as in automatic speaker verification. Subjective studies are conducted using analytic phase distorted speech signals, and the adversities occurred in human speaker verification task are observed. Motivated from the perceptual studies, we propose a method for feature extraction from analytic phase of speech signals. As unambiguous computation of analytic phase is not possible due to the phase wrapping problem, feature extraction is attempted from its derivative, i.e., the instantaneous frequency (IF). The IF is computed by exploiting the properties of the Fourier transform, and this strategy is free from the phase wrapping problem. The IF is computed from narrowband components of speech signal, and discrete cosine transform is applied on deviations in IF to pack the information in smaller number of coefficients, which are referred to as IF cosine coefficients (IFCCs). The nature of information in the proposed IFCC features is studied using minimal-pair ABX (MP-ABX) tasks, and t-stochastic neighbor embedding (t-SNE) visualizations. The performance of IFCC features is evaluated on NIST 2010 SRE database and is compared with mel frequency cepstral coefficients (MFCCs) and frequency domain linear prediction (FDLP) features. All the three features, IFCC, FDLP and MFCC, provided competitive speaker verification performance with average EERs of 2.3%, 2.2% and 2.4%, respectively. The IFCC features are more robust to vocal effort mismatch, and provided relative improvements of 26% and 11% over MFCC and FDLP features, respectively, on the evaluation conditions involving vocal effort mismatch. Since magnitude and phase represent different components of the speech signal, we have attempted to fuse the evidences from them at the i-vector level of speaker verification system. It is found that the i-vector fusion is considerably better than the conventional scores fusion. The i-vector fusion of FDLP+IFCC features provided a relative improvement of 36% over the system based on FDLP features alone, while the fusion of MFCC+IFCC provided a relative improvement of 37% over the system based on MFCC alone, illustrating that the proposed IFCC features provide complementary speaker specific information to the magnitude based FDLP and MFCC features.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Significance of analytic phase of speech signals in speaker verification

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Feb 26, 2016
Citations: 41

Similar Papers

Fusion of TEO Phase with MFCC Features for Speaker Verification
Purvi Agrawal ... Hemant A Patil
-
Purvi Agrawal, et. al.Purvi Agrawal ... Hemant A Patil
26 Feb 2015
26 Feb 2015

Effective preservation of higher-frequency contents in the context of short utterance based children’s speaker verification system
Shahid Aziz ... S Shahnawazuddin
Applied Acoustics | VOL. 209
Shahid Aziz, et. al.Shahid Aziz ... S Shahnawazuddin
10 May 2023
Applied Acoustics | VOL. 209

Text-Independent Speaker Identification by Combining MFCC and MVA Features
Mohamed Cherif Amara Korba ... Djemili Rafik
-
Mohamed Cherif Amara Korba, et. al.Mohamed Cherif Amara Korba ... Djemili Rafik
01 Nov 2018
01 Nov 2018

Text Dependent Speaker Identification in Noisy Environment
Pawan Kumar ... Nitika Jakhanwal
-
Pawan Kumar, et. al.Pawan Kumar ... Nitika Jakhanwal
01 Feb 2011
01 Feb 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Significance of analytic phase of speech signals in speaker verification

Abstract

Talk to us

Similar Papers

More From: Speech Communication