Improved Speech-Signal Based Frequency Warping Scale for Cepstral Feature in Robust Speaker Verification System

Susanta Kumar Sarangi,Goutam Saha

doi:10.1007/s11265-020-01517-2

Susanta Kumar Sarangi, Goutam Saha

https://doi.org/10.1007/s11265-020-01517-2

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Development of automatic speaker verification system (ASV) for real-world applications remains a major challenge. In this paper, we propose an improved speech-signal-based frequency warping scale to extract cepstral features from the speech signal for ASV application. The proposed scale is a modified version of the speech-signal-based scale, successfully used in speech recognition application, an allied domain. It uses spectral entropy weighted power spectral density to extract speaker specific attributes. This is complementary to fixed scale based mel frequency cepstral coefficient (MFCC) for different emphasis given to spectral regions. The work uses fusion based approach to exploit the complementarity of static MFCC and proposed feature. The performances of the ASV system that uses MFCC and the proposed technique are evaluated in clean and various noisy conditions on publicly available NIST SRE databases. Noise database (NOISEX-92) is used to simulate the noisy environment. The ASV system developed from the proposed feature extraction method shows slightly improved performance than baseline MFCC and SFCC (speech-signal-based frequency cepstral coefficient) based techniques in clean condition and up to 38.15% and 17.15%, respectively in noisy conditions. The fusion-based approach further improves the performance of ASV system with up to 53.85% and 36.22% relative improvement over baseline MFCC and SFCC based feature extraction methods, respectively.

Full Text