Abstract
This paper explores the significance of stereo-based stochastic feature compensation (SFC) methods for robust speaker verification (SV) in mismatched training and test environments. Gaussian Mixture Model (GMM)-based SFC methods developed in past has been solely restricted for speech recognition tasks. Application of these algorithms in a SV framework for background noise compensation is proposed in this paper. A priori knowledge about the test environment and availability of stereo training data is assumed. During the training phase, Mel frequency cepstral coefficient (MFCC) features extracted from a speaker's noisy and clean speech utterance (stereo data) are used to build front end GMMs. During the evaluation phase, noisy test utterances are transformed on the basis of a minimum mean squared error (MMSE) or maximum likelihood (MLE) estimate, using the target speaker GMMs. Experiments conducted on the NIST-2003-SRE database with clean speech utterances artificially degraded with different types of additive noises reveal that the proposed SV systems strictly outperform baseline SV systems in mismatched conditions across all noisy background environments.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.