Abstract

Activation functions provide deep neural networks the non-linearity that is necessary to learn complex distributions. It is still inconclusive what is the optimal shape for the activation function. In this work, we introduce a novel type of activation function of which the shape is learned with network training. The proposed Look-up Table Unit (LuTU) stores a set of anchor points in a look-up table like structure, and the activation function is generated from the anchor points by either linear interpolation or smoothing with a single period cosine mask function. LuTU is in theory able to approximate any univariate function. By observing the learned shapes of LuTU, we further propose a Mixture of Gaussian Unit (MoGU) that can learn similar non-linear shapes with much fewer parameters. Finally, we use a multiple activation function fusion framework that combines multiple types of functions to achieve better performance. The inference complexity of multiple activation function fusion is constant with linear interpolation approximation. Our experiments on a synthetic dataset, ImageNet, and CIFAR-10 demonstrate that the proposed method outperforms traditional ReLU family activation functions. On the ImageNet dataset, our method achieves 1.47% and 1.0% higher accuracy on ResNet-18 and ResNet-34 models, respectively. With the proposed activation function, we can design a network that has the same performance as ResNet-34 but 8 fewer convolutional layers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call