Abstract

In this study, the problem of sparse enrollment data for in-set versus out-of-set speaker recognition is addressed. The challenge here is that both the training speaker data (5 s) and test material (2~6 s) is of limited test duration. The limited enrollment data result in a sparse acoustic model space for the desired speaker model. The focus of this study is on filling these acoustic holes by harvesting neighbor speaker information to leverage overall system performance. Acoustically similar speakers are selected from a separate available corpus via three different methods for speaker similarity measurement. The selected data from these similar acoustic speakers are exploited to fill the lack of phone coverage caused by the original sparse enrollment data. The proposed speaker modeling process mimics the naturally distributed acoustic space for conversational speech. The Gaussian mixture model (GMM) tagging process allows simulated natural conversation speech to be included for in-set speaker modeling, which maintains the original system requirement of text independent speaker recognition. A human listener evaluation is also performed to compare machine versus human speaker recognition performance, with machine performance of 95% compared to 72.2% accuracy for human in-set/out-of-set performance. Results show that for extreme sparse train/reference audio streams, human speaker recognition is not nearly as reliable as machine based speaker recognition. The proposed acoustic hole filling solution (MRNC) produces an averaging 7.42% relative improvement over a GMM-Cohort UBM baseline and a 19% relative improvement over the Eigenvoice baseline using the FISHER corpus.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.