Optimal approximation rate of ReLU networks in terms of width and depth

Zuowei Shen,Haizhao Yang,Shijun Zhang

doi:10.1016/j.matpur.2021.07.009

Zuowei Shen, Haizhao Yang + Show 1 more

Open Access

https://doi.org/10.1016/j.matpur.2021.07.009

Copy DOI

Abstract

This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth. It is proved by construction that ReLU networks with width O(max⁡{d⌊N1/d⌋,N+2}) and depth O(L) can approximate a Hölder continuous function on [0,1]d with an approximation rate O(λd(N2L2ln⁡N)−α/d), where α∈(0,1] and λ>0 are Hölder order and constant, respectively. Such a rate is optimal up to a constant in terms of width and depth separately, while existing results are only nearly optimal without the logarithmic factor in the approximation rate. More generally, for an arbitrary continuous function f on [0,1]d, the approximation rate becomes O(dωf((N2L2ln⁡N)−1/d)), where ωf(⋅) is the modulus of continuity. We also extend our analysis to any continuous function f on a bounded set. Particularly, if ReLU networks with depth 31 and width O(N) are used to approximate one-dimensional Lipschitz continuous functions on [0,1] with a Lipschitz constant λ>0, the approximation rate in terms of the total number of parameters, W=O(N2), becomes O(λWln⁡W), which has not been discovered in the literature for fixed-depth ReLU networks.

Full Text