Abstract
Deep neural networks have shown very successful performance in a wide range of tasks, but a theory of why they work so well is in the early stage. Recently, the expressive power of neural networks, important for understanding deep learning, has received considerable attention. Classic results, provided by Cybenko, Barron, etc., state that a network with a single hidden layer and suitable activation functions is a universal approximator. A few years ago, one started to study how width affects the expressiveness of neural networks, i.e., a universal approximation theorem for a deep neural network with a Rectified Linear Unit (ReLU) activation function and bounded width. Here, we show how any continuous function on a compact set of Rnin,nin∈N can be approximated by a ReLU network having hidden layers with at most nin+5 nodes in view of an approximate identity.
Highlights
Over the past several years, deep neural networks have achieved state-of-the-art performance in a wide range of tasks such as image recognition/segmentation and machine translation
Most of the recent results on the universal approximation theory is about the Rectified Linear Unit (ReLU) network [5,13,14,15,16,17,18,19,20]
Lu et al [14] presented a universal approximation theorem for deep neural networks with ReLU activation functions and hidden layers with a bounded width in 2017, since the expressive power of depth in ReLU networks with a bounded width has received a lot of attention
Summary
Over the past several years, deep neural networks have achieved state-of-the-art performance in a wide range of tasks such as image recognition/segmentation and machine translation (see the review article [1] and recent book [2] for more background). The Rectified Linear Units (ReLU) activation function is the most popular choice in practical use of the neural network [12] In this reason, most of the recent results on the universal approximation theory is about the ReLU network [5,13,14,15,16,17,18,19,20]. Cohen et al [13] provided the deep convolutional neural network with the ReLU activation function that cannot be realized by a shallow network if the number of nodes of its hidden layer is no more than an exponential bound. Lu et al [14] presented a universal approximation theorem for deep neural networks with ReLU activation functions and hidden layers with a bounded width in 2017, since the expressive power of depth in ReLU networks with a bounded width has received a lot of attention.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have