A Comprehensive Evaluation of Weight Growth and Weight Elimination Methods Using the Tangent Plane Algorithm

P May,E Zhou,C W

doi:10.14569/ijacsa.2013.040621

Abstract

The tangent plane algorithm is a fast sequential learning method for multilayered feedforward neural networks that accepts almost zero initial conditions for the connection weights with the expectation that only the minimum number of weights will be activated. However, the inclusion of a tendency to move away from the origin in weight space can lead to large weights that are harmful to generalization. This paper evaluates two techniques used to limit the size of the weights, weight growing and weight elimination, in the tangent plane algorithm. Comparative tests were carried out using the Extreme Learning Machine which is a fast global minimiser giving good generalization. Experimental results show that the generalization performance of the tangent plane algorithm with weight elimination is at least as good as the ELM algorithm making it a suitable alternative for problems that involve time varying data such as EEG and ECG signals.

Highlights

In Lee [1] an algorithm was described for supervised training in multilayered feedforward neural networks giving faster convergence and improved generalization relative to the gradient descent backpropagation algorithm
A directional movement vector is introduced into the training process to push the movement in weight space towards the origin
The ability of the new improved tangent plane algorithm (iTPA) and original tangent plane algorithms to generalise from a given set of training data was evaluated and compared with the Extreme Learning

Summary

Introduction

In Lee [1] an algorithm was described for supervised training in multilayered feedforward neural networks giving faster convergence and improved generalization relative to the gradient descent backpropagation algorithm. This tangent plane algorithm starts the training with the connection weights set to values close to zero in the expectation that the minimum weights necessary will be activated. According to Bartlett [2], the size of the weights is more important than the number of weights in determining good generalization This poses the following question: can we modify this algorithm so that it discourages the formation of weights with large values? This poses the following question: can we modify this algorithm so that it discourages the formation of weights with large values? Further, can the algorithm encourage weights with small values to decay rapidly to zero producing a network having the optimum size for good generalization?

Objectives

Results

Conclusion