Abstract

This paper describes and discusses the speaker recognition system, which performed well in the NIST Speaker Recognition Evaluation 2006 (SRE). STBU is a consortium of four partners: Spescom DataVoice (Stellenbosch, South Africa), TNO (Soesterberg, The Netherlands), BUT (Brno, Czech Republic), and the University of Stellenbosch (Stellenbosch, South Africa). The STBU system was a combination of three main kinds of subsystems: 1) GMM, with short-time Mel frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) features, 2) Gaussian mixture model-support vector machine (GMM-SVM), using GMM mean supervectors as input to an SVM, and 3) maximum-likelihood linear regression-support vector machine (MLLR-SVM), using MLLR speaker adaptation coefficients derived from an English large vocabulary continuous speech recognition (LVCSR) system. All subsystems made use of supervector subspace channel compensation methods-either eigenchannel adaptation or nuisance attribute projection. We document the design and performance of all subsystems, as well as their fusion and calibration via logistic regression. Finally, we also present a cross-site fusion that was done with several additional systems from other NIST SRE-2006 participants.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.