Abstract

This paper presents a hybrid approach between scale-space theory and deep learning, where a deep learning architecture is constructed by coupling parameterized scale-space operations in cascade. By sharing the learnt parameters between multiple scale channels, and by using the transformation properties of the scale-space primitives under scaling transformations, the resulting network becomes provably scale covariant. By in addition performing max pooling over the multiple scale channels, or other permutation-invariant pooling over scales, a resulting network architecture for image classification also becomes provably scale invariant. We investigate the performance of such networks on the MNIST Large Scale dataset, which contains rescaled images from the original MNIST dataset over a factor of 4 concerning training data and over a factor of 16 concerning testing data. It is demonstrated that the resulting approach allows for scale generalization, enabling good performance for classifying patterns at scales not spanned by the training data.

Highlights

  • IntroductionVariations in scale constitute a substantial source of variability in real-world images, because of objects having different size in the world and being at different distances to the camera

  • Variations in scale constitute a substantial source of variability in real-world images, because of objects having different size in the world and being at different distances to the camera.A problem with traditional deep networks, is that they are not covariant with respect to scaling transformations in the image domain

  • Motivated by the fact that a large number of visual tasks have been successfully addressed by first- and second-order Gaussian derivatives, which are the primitive filters in the Gaussian 2-jet6, let us explore the consequences of using linear combinations of first- and second-order Gaussian derivatives as the class of possible filter weight primitives in a deep network

Read more

Summary

Introduction

Variations in scale constitute a substantial source of variability in real-world images, because of objects having different size in the world and being at different distances to the camera. Nonlinearities are performed relative to the current grid spacing, which implies that the deep network does not commute with scaling transformations Because of this lack of ability to handle scaling variations in the image domain, the performance of deep networks may be very poor when subject to testing data at scales that are not spanned by the training data. We will experimentally explore this idea for one specific type of architecture, where the layers are parameterized linear combinations of Gaussian derivatives up to order two With such a parameterization of the filters in the deep network, we obtain a compact parameterization of the degrees of freedom in the network, with of the order of 16k or 38k parameters for the networks used in the experiments in this paper, which may be advantageous in situations when only smaller sets of training data are available. The overall principle for obtaining scale covariance and scale invariance is, much more general and applies to much wider classes of possible ways of defining layers from scale-space operations

Relations to Previous Contribution
Relations to Previous Work
Scale Covariance and Scale Invariance
Approach to Scale Generalization
Influence of the Inner and the Outer Scales of the Image
Gaussian Derivative Networks
Gaussian Derivative Layers
Definition of a Gaussian Derivative Network
Provable Scale Covariance
Provable Scale Invariance
Experiments with a Single-Scale-Channel Network
Discrete Implementation image
Experiments with a Multi-scale-Channel Network
Scale Selection Properties
Findings
Summary and Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call