To obtain both good training performance and good generalization in multilayer perceptron (MLP) networks, it is essential to use small networks that avoid overfitting the training data. A common approach for doing this is to train a large network and then to prune the unnecessary units or weights. An effective hidden unit-pruning algorithm called linear dependence (LD) pruning utilizing sets of linear equations is presented in this paper. In this approach, hidden unit outputs (basis functions) are modeled as a linear combination of outputs of other units. The least useful hidden unit is identified as that which is predicted to increase training error the least when replaced by its model. After this hidden unit is found, the new pruning algorithm replaces it with its model and retrains the network output weights by one iteration of training. The LD pruning algorithm's performance is compared with that of a modified optimal brain surgeon (OBS) pruning algorithm. We show that the LD pruning algorithm performs as well as the OBS method, yet requires orders of magnitude fewer multiplies.
Read full abstract