Neural networks with ReLU powers need less depth

Jose Ernie C. Lope,Kurt Izak M. Cabanilla,Rhudaina Z. Mohammad

doi:10.1016/j.neunet.2023.12.027

Abstract

Despite the widespread success of deep learning in various applications, neural network theory has been lagging behind. The choice of the activation function plays a critical role in the expressivity of a neural network but for reasons that are not yet fully understood. While the rectified linear unit (ReLU) is currently one of the most popular activation functions, ReLU squared has only recently been empirically shown to be pivotal in producing consistently superior results for state-of-the-art deep learning tasks (So et al., 2021). To analyze the expressivity of neural networks with ReLU powers, we employ the novel framework of Gribonval et al. (2022) based on the classical concept of approximation spaces. We consider the class of functions for which the approximation error decays at a sufficiently fast rate as network complexity, measured by the number of weights, increases. We show that when approximating sufficiently smooth functions that cannot be represented by sufficiently low-degree polynomials, networks with ReLU powers need less depth than those with ReLU. Moreover, if they have the same depth, networks with ReLU powers can have potentially faster approximation rates. Lastly, our computational experiments on approximating the Rastrigin and Ackley functions with deep neural networks showed that ReLU squared and ReLU cubed networks consistently outperform ReLU networks.

Full Text