A note about why deep learning is deep: A discontinuous approximation perspective

Yongxin Li,Hansheng Wang,Haobo Qi

doi:10.1002/sta4.654

Abstract

AbstractDeep learning has achieved unprecedented success in recent years. This approach essentially uses the composition of nonlinear functions to model the complex relationship between input features and output labels. However, a comprehensive theoretical understanding of why the hierarchical layered structure can exhibit superior expressive power is still lacking. In this paper, we provide an explanation for this phenomenon by measuring the approximation efficiency of neural networks with respect to discontinuous target functions. We focus on deep neural networks with rectified linear unit (ReLU) activation functions. We find that to achieve the same degree of approximation accuracy, the number of neurons required by a single‐hidden‐layer (SHL) network is exponentially greater than that required by a multi‐hidden‐layer (MHL) network. In practice, discontinuous points tend to contain highly valuable information (i.e., edges in image classification). We argue that this may be a very important reason accounting for the impressive performance of deep neural networks. We validate our theory in extensive experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A note about why deep learning is deep: A discontinuous approximation perspective

Abstract

Talk to us

Similar Papers

More From: Stat

Lead the way for us

Similar Papers

ReLU Network with Bounded Width Is a Universal Approximator in View of an Approximate Identity
Sunghwan Moon
Applied Sciences | VOL. 11
Sunghwan MoonSunghwan Moon
04 Jan 2021
Applied Sciences | VOL. 11

Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning
Hock Hung Chieng ... Sai Raj Kishore Perla
International Journal of Advances in Intelligent Informatics | VOL. 4
Hock Hung Chieng, et. al.Hock Hung Chieng ... Sai Raj Kishore Perla
31 Jul 2018
International Journal of Advances in Intelligent Informatics | VOL. 4

An Adaptive Offset Activation Function for CNN Image Classification Tasks
Yuanyuan Jiang ... Dong Zhang
Electronics | VOL. 11
Yuanyuan Jiang, et. al.Yuanyuan Jiang ... Dong Zhang
18 Nov 2022
Electronics | VOL. 11

Natural-Logarithm-Rectified Activation Function in Convolutional Neural Networks
Yang Liu ... Jianpeng Zhang
-
Yang Liu, et. al.Yang Liu ... Jianpeng Zhang
01 Dec 2019
01 Dec 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A note about why deep learning is deep: A discontinuous approximation perspective

Abstract

Talk to us

Similar Papers

More From: Stat