AbstractIn this paper, the authors propose a method of assigning probabilistic class labels to unlabeled time‐series data that can be collected inexpensively to use together with labeled data as training data in order to improve the performance of hidden Markov models (HMM) as classifiers. Conventional methods of using unlabeled data according to probabilistic class labels were only for static data. They could not deal with time‐series data, which are used with HMMs. Therefore, the authors extended the conventional methods as follows. First, they formed extended tied‐mixture hidden Markov models (ETM‐HMMs) by mixing multiple HMMs that exist independently for each class. Then, they introduced the extended Baum‐Welch (EBW) algorithm as the learning algorithm from time‐series data in which labeled and unlabeled data coexist in the ETM‐HMMs. The authors then assumed a situation in which there existed insufficient labeled training data and used the proposed method to perform classification experiments for Japanese sign language data and voice data. When unlabeled data were added to the labeled training data according to the proposed method, the classification error rate for Japanese sign language signs improved from 38.7% to 30.4%. Also, by making the model more detailed in addition to adding unlabeled data, the classification error rate for phonemes improved from 69.1% to 52.9%. © 2003 Wiley Periodicals, Inc. Syst Comp Jpn, 34(13): 1–12, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.10537
Read full abstract