Abstract

Development of automatic speaker verification system (ASV) for real-world applications remains a major challenge. In this paper, we propose an improved speech-signal-based frequency warping scale to extract cepstral features from the speech signal for ASV application. The proposed scale is a modified version of the speech-signal-based scale, successfully used in speech recognition application, an allied domain. It uses spectral entropy weighted power spectral density to extract speaker specific attributes. This is complementary to fixed scale based mel frequency cepstral coefficient (MFCC) for different emphasis given to spectral regions. The work uses fusion based approach to exploit the complementarity of static MFCC and proposed feature. The performances of the ASV system that uses MFCC and the proposed technique are evaluated in clean and various noisy conditions on publicly available NIST SRE databases. Noise database (NOISEX-92) is used to simulate the noisy environment. The ASV system developed from the proposed feature extraction method shows slightly improved performance than baseline MFCC and SFCC (speech-signal-based frequency cepstral coefficient) based techniques in clean condition and up to 38.15% and 17.15%, respectively in noisy conditions. The fusion-based approach further improves the performance of ASV system with up to 53.85% and 36.22% relative improvement over baseline MFCC and SFCC based feature extraction methods, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call