Abstract

AbstractThis paper addresses the problem of the detection of speaker changes and clustering speakers when no information is available regarding speaker classes or even the total number of classes. This study provides preprocessing for a method which adapts speaker information based on incremental on‐line speaker clustering. We assume that no previous information on speakers is available (no speaker model, no training phase) and that people do not speak simultaneously. The aim is to apply speaker grouping information to speaker adaptation for speech recognition.We use Vector Quantization (VQ) distortion as the criterion. A speaker model is created from successive utterances as a codebook by a VQ algorithm, and the VQ distortion is calculated between the model and an utterance.The results of an experiment on speaker detection and speaker clustering are presented. The speaker change detection experiment was compared with results of the Generalized Likelihood Ratio (GLR) and Bayesian Information Criterion (BIC). By using GLR and BIC, we obtained F‐measures of 75.8 and 81.4, respectively. On the other hand, we obtained an F‐measure of 84.4 by the proposed method. Furthermore, by a combination of the method with an acoustic back‐off method, we obtained a measure of 93.1 and a classification rate of 97.7%. These results show the superiority of our proposed method. Finally, we demonstrate the validity of our proposed method by experiments on speech recognition using on‐line speaker adaptation. © 2003 Wiley Periodicals, Inc. Syst Comp Jpn, 34(13): 25–35, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.10488

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call