Abstract

Despite the powerful expressivity of neural networks with nonlinear activation functions, the underlying mechanism for deep neural networks still remains unclear. However, it can be proved that ultra-wide neural networks are equivalent to Gaussian processes, thus connecting the analysis on neural networks with Bayesian statistics and kernel methods. Moreover, recent studies on infinitely wide neural networks extend this correspondence to a specific kernel, named Neural Tangent Kernel (NTK), which governs the learning dynamics of related neural networks. Without weights and biases, the NTK recursively encodes the architecture information about the corresponding neural networks, including the activation function at each hidden layer. Inspired by this close relationship of Gaussian processes and neural networks, we propose a heuristic search method for activation functions of sufficiently wide neural networks in the NTK regime. To obtain an elegant and closed-form computation, activation functions are decomposed in the basis of Hermite polynomials, which converts the kernels in Gaussian processes into power series. Experiments show the outperformance of the obtained nonlinearities compared with other common activation functions. This work also reveals the potential utility of NTKs for guidance on neural network structure search in the future.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call