Abstract
The paper addresses unsupervised speaker indexing and automatic speech recognition of discussions. In speaker indexing, there are two cases, where the number of speakers is unknown beforehand and where the number is known. When the specified number is unknown, it is difficult to apply to various data because it needs to determine several parameters like threshold. In addition, serious problems arise in applying a uniform model because variations in the utterance durations of speakers are large. We thus propose a method which can robustly perform speaker indexing for the two cases using a flexible framework in which an optimal speaker model (GMM or VQ) is selected based on the BIC (Bayesian information criterion). Moreover, we propose a combination method of speaker adaptation based on speaker selection and the indexing method. For real discussion archives, we demonstrated that indexing performance is higher than that of conventional methods for the two cases and speech recognition performance was improved by the combination method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.