Abstract

This article concerns the expressive power of depth in neural nets with ReLU activations and a bounded width. We are particularly interested in the following questions: What is the minimal width w min ( d ) so that ReLU nets of width w min ( d ) (and arbitrary depth) can approximate any continuous function on the unit cube [ 0 , 1 ] d arbitrarily well? For ReLU nets near this minimal width, what can one say about the depth necessary to approximate a given function? We obtain an essentially complete answer to these questions for convex functions. Our approach is based on the observation that, due to the convexity of the ReLU activation, ReLU nets are particularly well suited to represent convex functions. In particular, we prove that ReLU nets with width d + 1 can approximate any continuous convex function of d variables arbitrarily well. These results then give quantitative depth estimates for the rate of approximation of any continuous scalar function on the d-dimensional cube [ 0 , 1 ] d by ReLU nets with width d + 3 .

Highlights

  • Over the past several years, neural nets, deep nets, have become the state-of-the-art in a remarkable number of machine learning problems, from mastering go to image recognition/segmentation and machine translation.Despite all their practical successes, a robust theory of why they work so well is in its infancy

  • We show that every convex function on [0, 1]d that is piecewise affine with N pieces can be represented exactly by a ReLU net with width d + 1 and depth N

  • We considered in this article the expressive power of ReLU networks with bounded hidden layer widths

Read more

Summary

Introduction

Over the past several years, neural nets, deep nets, have become the state-of-the-art in a remarkable number of machine learning problems, from mastering go to image recognition/segmentation and machine translation (see the review article [1] for more background). ReLU nets of width w can approximate any positive convex function on [0, 1]d arbitrarily well (3). Theorem 1 addresses Q2 by providing quantitative estimates on the depth of a ReLU net with width d + 1 that approximates a given convex function. We prove that the depth of the network that computes such a function is bounded by the number affine pieces it contains. This extends the results of Arora-Basu-Mianjy-Mukherjee (e.g., Theorem 2.1 and Corollary 2.2 in [2]). We show that every convex function on [0, 1]d that is piecewise affine with N pieces can be represented exactly by a ReLU net with width d + 1 and depth N

Statement of Results
Relation to Previous Work
Proof of Theorem 2
Proof of Theorem 1
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call