Abstract
This paper describes a novel approach for discriminative modeling and its application to automatic text-independent speaker verification. This approach maximizes the margin between the model scores for pairs of utterances belonging to the same speaker and for pairs of utterances belonging to different speakers. A low-dimensional linear kernel is estimated which maximizes this margin. This approach emphasizes features which have a better ability to discriminate between scores belonging to pairs of utterances of the same target speakers and those of different speakers. In this paper, we apply this approach to the NIST 2005 speaker verification task. Compared to the Gaussian mixture model (GMM) baseline system, a 17.7% relative improvement in the minimum detection cost function (DCF) and a 11.7% relative improvement in equal error rate (EER) are obtained. We achieve also a 5.7% relative improvement in EER and 2.3% relative improvement in DCF by using our approach on top of a nuisance attribute projection (NAP) compensated GMMbased kernel baseline system.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.