Pruning and growing hierachical mixtures of experts

S.R Waterhouse

doi:10.1049/cp:19950579

Abstract

The 'hierarchical mixture of experts' (HME) is a tree-structured statistical model that is an alternative to multilayer perceptrons. Its training algorithm consists of a number of forward and backward passes through the tree. These are computationally expensive, especially when the trees are large. To reduce the computation, we may either allow the network to find its own structure in a constructive manner (tree growing) or consider only the most likely paths through the tree (path pruning). Pruning keeps the number of parameters constant but considers only the most likely paths through the tree at any time; this leads to significant speedups in training and evaluation. In the growing algorithm, we start with a small tree and apply a splitting criterion based on maximum likelihood to each terminal node. After splitting the best node according to this criterion, we retrain the tree for a set number of iterations, or until there is no further increase in likelihood, at which point the tree is grown again. This results in a flexible architecture which is both faster to train and more efficient in terms of its parameters. To aid the convergence of these algorithms, it is beneficial to introduce regularization into the HME, which stops the evolution of large weights which would otherwise cause branches of the tree to be pinched off. This also aids generalization, as we demonstrate on a toy regression problem. Results for the growing and pruning algorithms show significant speedups over conventional algorithms in discriminating between two interlocking spirals and classifying 8-bit parity patterns.

Full Text