Abstract

Phonetic variability is one of the primary challenges in short duration speaker verification. This paper proposes a novel method that modifies the standard normal distribution prior in the total variability model to use a mixture of Gaussians as the prior distribution. The proposed speaker-phonetic vectors are then estimated from the posterior probability of latent variables, and each vector has a phonetic meaning. Unlike the standard total variability model, the proposed method can incorporate a phoneme classifier to perform soft content matching, which has the potential to solve the phonetic variability problem. Parameter estimation and scoring formulae for speaker-phonetic vectors method are presented. Experimental results obtained using NIST 2010 data show that the proposed technique leads to relative improvements of more than 30% when fused with total variability model and tested on 3 second duration test files.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.