Abstract

While non-linear activation functions play vital roles in artificial neural networks, it is generally unclear how the non-linearity can improve the quality of function approximations. In this paper, we present a theoretical framework to rigorously analyze the performance gain of using non-linear activation functions for a class of residual neural networks (ResNets). In particular, we show that when the input features for the ResNet are uniformly chosen and orthogonal to each other, using non-linear activation functions to generate the ResNet output averagely outperforms using linear activation functions, and the performance gain can be explicitly computed. Moreover, we show that when the activation functions are chosen as polynomials with the degree much less than the dimension of the input features, the optimal activation functions can be precisely expressed in the form of Hermite polynomials. This demonstrates the role of Hermite polynomials in function approximations of ResNets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call