Abstract

In this paper, we examine the problem of text-independent open-set speaker identification (OS-SI) in broadcast news. Particularly, the impact of the population of registered speakers to OS-SI performance is investigated, which is the central issue for designing practical OS-SI system. We amend the maximum mutual information (MMI)-based discriminative training scheme to facilitate its incorporation in OS-SI systems. We also improve the implementation to allow the application of MMI-based approach with 2048-component Gaussian mixture models. All systems are evaluated using NIST RT-03, RT-04 and FBIS corpora, with a maximum of 82 registered speakers. Our study shows that notable performance improvement can be obtained with MMI-based discriminative training, which reduces the equal error rate (EER) by 15.9% relatively, in comparison to the GMM-MAP scheme.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.