The capacity of feedforward neural networks

Pierre Baldi,Roman Vershynin

doi:10.1016/j.neunet.2019.04.009

Pierre Baldi, Roman Vershynin

Open Access

https://doi.org/10.1016/j.neunet.2019.04.009

Copy DOI

Journal: Neural Networks	Publication Date: Apr 22, 2019
Citations: 55	License type: publisher-specific-oa

Affiliation: University of California, Irvine

Abstract

A long standing open problem in the theory of neural networks is the development of quantitative methods to estimate and compare the capabilities of different architectures. Here we define the capacity of an architecture by the binary logarithm of the number of functions it can compute, as the synaptic weights are varied. The capacity provides an upperbound on the number of bits that can be extracted from the training data and stored in the architecture during learning. We study the capacity of layered, fully-connected, architectures of linear threshold neurons with L layers of size n1,n2,…,nL and show that in essence the capacity is given by a cubic polynomial in the layer sizes: C(n1,…,nL)=∑k=1L−1min(n1,…,nk)nknk+1, where layers that are smaller than all previous layers act as bottlenecks. In proving the main result, we also develop new techniques (multiplexing, enrichment, and stacking) as well as new bounds on the capacity of finite sets. We use the main result to identify architectures with maximal or minimal capacity under a number of natural constraints. This leads to the notion of structural regularization for deep architectures. While in general, everything else being equal, shallow networks compute more functions than deep networks, the functions computed by deep networks are more regular and “interesting”.

Highlights

Since their early beginnings (e.g. [17, 21]), neural networks have come a significant way. Today they are at the center of myriads of successful applications, spanning the gamut from games all the way to biomedicine [22, 23, 4]. In spite of these successes, the problem of quantifying the power of a neural architecture, in terms of the space of functions it can implement as its synaptic weights are varied, has remained open
In this work we introduce a notion of capacity for neural architectures and study how this capacity can be computed
The bulk of this paper focuses on estimating the capacity of arbitrary feedforward, layered and fully-connected, architectures of any depth which are widely used in many applications

Summary

Introduction

Since their early beginnings (e.g. [17, 21]), neural networks have come a significant way. There are various universal approximation theorems [15, 13] showing, for instance, that continuous functions defined over compact sets can be approximated to arbitrary degrees of precision by architectures of the form A(n1, ∞, m), where we use “∞” to denote the fact that the hidden layer may be arbitrary large Beyond these results, very little is known about the functional capacity of A(n1, . The main result of this paper, Theorem 3.1, provides an estimate of the capacity of a general feedforward, layered, fully connected neural network of linear threshold gates. Suppose that such network has L layers with nk neurons in layer k, where k = 1 corresponds to the input layer and nL correspond to the output layer. A reader familiar with neural network theory may glance through it and rapidly go to Section 3, which provides a description of the new results and provides a roadmap for the paper

Neural architectures and their capacities

Overview of new results

Useful examples of threshold maps

Capacity of networks: upper bounds

Capacity of product sets: slicing

Capacity of general sets

Networks with one hidden layer: multiplexing

Networks with two hidden layers: enrichment

10. Networks with arbitrarily many layers: stacking

11. Extremal capacity

12. Structural regularization

13. Polynomial threshold functions

14. Open questions

15. Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The capacity of feedforward neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Neural Networks

Lead the way for us

Similar Papers

Neuroeducational Environment for Acquisition of Competencies in the Field of End-To-End Digital Technologies (Neurotechnology) in the Conditions of Digital Transformation
M E Mazurov ... V A Titov
Open Education | VOL. 24
M E Mazurov, et. al.M E Mazurov ... V A Titov
28 Dec 2020
Open Education | VOL. 24

Efficient Learning Algorithm Using Compact Data Representation in Neural Networks
Masaya Kibune ... Michael G Lee
-
Masaya Kibune, et. al.Masaya Kibune ... Michael G Lee
01 Jan 2017
01 Jan 2017

Theory of deep convolutional neural networks III: Approximating radial functions
Tong Mao ... Ding-Xuan Zhou
Neural Networks | VOL. 144
Tong Mao, et. al.Tong Mao ... Ding-Xuan Zhou
06 Oct 2021
Neural Networks | VOL. 144

Training Methods for Deep Neural Network-Based Acoustic Models in Speech Recognition
Tamás Grósz
-
Tamás GrószTamás Grósz
01 Mar 2019
01 Mar 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The capacity of feedforward neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Neural Networks