Abstract

The most popular features for speaker recognition are Mel frequency cepstral coefficients (MFCCs) and linear prediction cepstral coefficients (LPCCs). These features are used extensively because they characterize the vocal tract configuration which is known to be highly speaker-dependent. In this work, several features are introduced that can characterize the vocal system in order to complement the traditional features and produce better speaker recognition models. The spectral centroid (SC), spectral bandwidth (SBW), spectral band energy (SBE), spectral crest factor (SCF), spectral flatness measure (SFM), Shannon entropy (SE), and Renyi entropy (RE) were utilized for this purpose. This work demonstrates that these features are robust in noisy conditions by simulating some common distortions that are found in the speakers' environment and a typical telephone channel. Babble noise, additive white Gaussian noise (AWGN), and a bandpass channel with 1 dB of ripple were used to simulate these noisy conditions. The results show significant improvements in classification performance for all noise conditions when these features were used to complement the MFCC and MFCC features. In particular, the SC and SCF improved performance in almost all noise conditions within the examined SNR range (10–40 dB). For example, in cases where there was only one source of distortion, classification improvements of up to 8% and 10% were achieved under babble noise and AWGN, respectively, using the SCF feature.

Highlights

  • Speaker recognition has many potential applications as a biometric tool since there are many tasks that can be performed remotely using speech

  • The recordings are made in an acoustically quiet environment using a high-quality microphone, and some distortions were added to simulate a practical telephone channel. These distortions included bandpass filtering (300 Hz–3.4 kHz) to simulate the characteristics of a telephone channel, babble noise to simulate background speakers that might be found in some environments, and additive white Gaussian noise (AWGN) to simulate normal background noise found in many environments

  • It is evident from these results that there is some speakerdependent information captured by the spectral centroid (SC), spectral band energy (SBE), spectral bandwidth (SBW), spectral crest factor (SCF), SBE, and Renyi entropy (RE) features as they improved identification rates when combined with the standard Mel frequency cepstral coefficients (MFCCs)-based features

Read more

Summary

Introduction

Speaker recognition has many potential applications as a biometric tool since there are many tasks that can be performed remotely using speech. For telephone-based applications (i.e., banking or customer service), there are many costly crimes such as identity theft or fraud that can be prevented by enhanced security protocols. In these applications, the identity of users cannot be verified because there is no direct contact between the user and the service provider. Speaker recognition is performed by extracting some speaker-dependent characteristics from speech signals. For this purpose, the speaker’s vocal tract configuration has been recognized to be extremely speaker-dependent because of the anatomical and behavioral differences between subjects. Many techniques have been proposed for characterizing the vocal tract configuration from speech signals; a good review of these techniques is provided in [1]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.