Regularisation of neural networks by enforcing Lipschitz continuity

Henry Gouk,Michael J Cree,Eibe Frank,Bernhard Pfahringer

doi:10.1007/s10994-020-05929-w

Henry Gouk, Michael J Cree + Show 2 more

Open Access

https://doi.org/10.1007/s10994-020-05929-w

Copy DOI

Journal: Machine Learning	Publication Date: Dec 6, 2020
Citations: 121	License type: open-access

Affiliation: University of Edinburgh, University of Waikato

Abstract

We investigate the effect of explicitly enforcing the Lipschitz continuity of neural networks with respect to their inputs. To this end, we provide a simple technique for computing an upper bound to the Lipschitz constant—for multiple p-norms—of a feed forward neural network composed of commonly used layer types. Our technique is then used to formulate training a neural network with a bounded Lipschitz constant as a constrained optimisation problem that can be solved using projected stochastic gradient methods. Our evaluation study shows that the performance of the resulting models exceeds that of models trained with other common regularisers. We also provide evidence that the hyperparameters are intuitive to tune, demonstrate how the choice of norm for computing the Lipschitz constant impacts the resulting model, and show that the performance gains provided by our method are particularly noticeable when only a small amount of training data is available.

Highlights

Supervised learning is primarily concerned with the problem of approximating a function given examples of what output should be produced for a particular input
The results on CIFAR-100 follow a similar trend to those observed on CIFAR-10: Lipschitz constant constraint (LCC) performs the best, dropout provides a small increase in performance over no regularisation, and combining dropout other approaches can sometimes provide a small boost in accuracy
This paper has presented a simple and effective regularisation technique for deep feed-forward neural networks called Lipschitz constant constraint (LCC), shown that it is applicable to a variety of feed-forward neural network architectures, and established that it is suited to situations where only a small amount of training data is available

Summary

Introduction

Supervised learning is primarily concerned with the problem of approximating a function given examples of what output should be produced for a particular input. Machine Learning (2021) 110:393–416 we need to select an appropriate space of functions in which the machine should search for a good approximation, and select an algorithm to search through this space This is typically done by first picking a large family of models, such as support vector machines or decision trees, and applying a suitable search algorithm. Well-understood regularisation approaches adapted from linear models, such as applying an 2 penalty term to the model parameters, are known to be less effective than the heuristic approaches (Srivastava et al 2014). This provides a clear motivation for developing well-founded and effective regularisation methods for neural networks.

Related work

Computing the Lipschitz constant

Fully connected layers

Convolutional layers

Pooling layers and activation functions

Residual connections

Constraining the Lipschitz constant

Stability of p‐norm estimation

Compatibility with batch normalisation

Interaction with dropout

Experiments

Method

CIFAR‐10

CIFAR‐100

MNIST and Fashion‐MNIST

Street view house numbers

Fully connected networks

Sensitivity to

Sample efficiency

Do other methods constrain the Lipschitz constant?

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Regularisation of neural networks by enforcing Lipschitz continuity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning

Lead the way for us

Similar Papers

Machine learning of weld joint penetration from weld pool surface using support vector regression
Rong Liang ... Yuming Zhang
Journal of Manufacturing Processes | VOL. 41
Rong Liang, et. al.Rong Liang ... Yuming Zhang
27 Mar 2019
Journal of Manufacturing Processes | VOL. 41

Introduction to the Special Issue on Learning in Intelligent Algorithms and Systems Design
Chengqi Zhang* ... Zheru Chi
Journal of Advanced Computational Intelligence and Intelligent Informatics | VOL. 3
Chengqi Zhang*, et. al. Chengqi Zhang* ... Zheru Chi
20 Dec 1999
Journal of Advanced Computational Intelligence and Intelligent Informatics | VOL. 3

An investigation of recurrent neural network for daily activity recognition using multi-modal signals
Akira Tamamori ... Tomoki Hayashi
-
Akira Tamamori, et. al.Akira Tamamori ... Tomoki Hayashi
01 Dec 2017
01 Dec 2017

A Study of Local Optima for Learning Feature Interactions using Neural Networks
Yangzi Guo ... Ying Nian Wu
-
Yangzi Guo, et. al.Yangzi Guo ... Ying Nian Wu
18 Jul 2021
18 Jul 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Regularisation of neural networks by enforcing Lipschitz continuity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning