Evaluation of convolutional neural networks for visual recognition

C Nebauer

doi:10.1109/72.701181

Abstract

Convolutional neural networks provide an efficient method to constrain the complexity of feedforward neural networks by weight sharing and restriction to local connections. This network topology has been applied in particular to image classification when sophisticated preprocessing is to be avoided and raw images are to be classified directly. In this paper two variations of convolutional networks--neocognitron and a modification of neocognitron--are compared with classifiers based on fully connected feedforward layers (i.e., multilayer perceptron, nearest neighbor classifier, auto-encoding network) with respect to their visual recognition performance. Beside the original neocognitron a modification of the neocognitron is proposed which combines neurons from perceptron with the localized network structure of neocognitron. Instead of training convolutional networks by time-consuming error backpropagation, in this work a modular procedure is applied whereby layers are trained sequentially from the input to the output layer in order to recognize features of increasing complexity. For a quantitative experimental comparison with standard classifiers two very different recognition tasks have been chosen: handwritten digit recognition and face recognition. In the first example on handwritten digit recognition the generalization of convolutional networks is compared to fully connected networks. In several experiments the influence of variations of position, size, and orientation of digits is determined and the relation between training sample size and validation error is observed. In the second example recognition of human faces is investigated under constrained and variable conditions with respect to face orientation and illumination and the limitations of convolutional networks are discussed.

Full Text