Abstract

A decision tree is built by successively splitting the observation frames of a phonetic unit according to the best phonetic questions. To prevent over-large tree models, the stopping criterion is required to suppress tree growing. It is crucial to exploit the goodness-of-split criteria to choose the best questions for node splitting and test if the hypothesis of splitting should be terminated. The robust tree models could be established. In this study, we apply the Hubert's Γ statistic as the node splitting criterion and the T <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -statistic as the stopping criterion. Hubert's Γ statistic is a cluster validity measure, which characterizes the degree of clustering in the available data. This measure is useful to select the best questions to unravel tree nodes. Further, we examine the population closeness of two child nodes with a significant level, T <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -statistic is determined to validate whether the corresponding mean vectors are close together. The splitting is stopped when validated. In continuous speech recognition experiments, the proposed methods achieve better recognition rates with smaller tree models compared to the maximum likelihood and minimum description length criteria.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call