Abstract

Regularization of neural networks can alleviate overfitting in the training phase. Current regularization methods, such as Dropout and DropConnect, randomly drop neural nodes or connections based on a uniform prior. Such a data-independent strategy does not take into consideration of the quality of individual unit or connection. In this paper, we aim to develop a data-dependent approach to regularizing neural network in the framework of Information Geometry. A measurement for the quality of connections is proposed, namely confidence. Specifically, the confidence of a connection is derived from its contribution to the Fisher information distance. The network is adjusted by retaining the confident connections and discarding the less confident ones. The adjusted network, named as ConfNet, would carry the majority of variations in the sample data. The relationships among confidence estimation, Maximum Likelihood Estimation and classical model selection criteria (like Akaike information criterion) is investigated and discussed theoretically. Furthermore, a Stochastic ConfNet is designed by adding a self-adaptive probabilistic sampling strategy. The proposed data-dependent regularization methods achieve promising experimental results on three data collections including MNIST, CIFAR-10 and CIFAR-100.

Highlights

  • Neural networks (NNs) that consist of multiple hidden layers can automatically learn effective representation for a learning task, such as, speech recognition [1,2,3], image classification [4,5,6], and natural language processing [7]

  • The performance of deep neural networks (DNNs) have significantly improved by such ensemble strategy

  • The DNNs with regularization achieve better performances than the DNNs without regularization. This could reflect the existence of overfitting during training, and the effectiveness of regularization method

Read more

Summary

Introduction

Neural networks (NNs) that consist of multiple hidden layers can automatically learn effective representation for a learning task, such as, speech recognition [1,2,3], image classification [4,5,6], and natural language processing [7]. A neural network with too many layers or units, especially deep neural networks (DNNs) [8], would overfit in the training phase and lead to a poor predictive performance in the testing phase. In order to alleviate the overfitting problem in DNNs, many regularization methods have been developed, including data augmentation [9], early stopping, amending cost functions with weight penalties (`1 or2 ), and modifying networks by randomly dropping a certain percentage of units (Dropout [10]) or connections (DropConnect [11]). The Dropout strategy randomly drops units (along with their connections) in a neural network during training. A large sum of sub-networks that randomly dropped units would be trained. In DropConnect [11], a network is regularized by randomly drawing a subset of connections independently from a uniform prior in the training phrase and using a Gaussian sampling

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call