Learning Activation Functions in Deep (Spline) Neural Networks

Pakshal Bohra,Joaquim Campos,Shayan Aziznejad,Harshit Gupta,Michael Unser

doi:10.1109/ojsp.2020.3039379

Pakshal Bohra, Joaquim Campos + Show 3 more

Open Access

https://doi.org/10.1109/ojsp.2020.3039379

Copy DOI

Abstract

We develop an efficient computational solution to train deep neural networks (DNN) with free-form activation functions. To make the problem well-posed, we augment the cost functional of the DNN by adding an appropriate shape regularization: the sum of the second-order total-variations of the trainable nonlinearities. The representer theorem for DNNs tells us that the optimal activation functions are adaptive piecewise-linear splines, which allows us to recast the problem as a parametric optimization. The challenging point is that the corresponding basis functions (ReLUs) are poorly conditioned and that the determination of their number and positioning is also part of the problem. We circumvent the difficulty by using an equivalent B-spline basis to encode the activation functions and by expressing the regularization as an $\ell _1$ -penalty. This results in the specification of parametric activation function modules that can be implemented and optimized efficiently on standard development platforms. We present experimental results that demonstrate the benefit of our approach.

Highlights

IntroductionDeep neural networks (DNNs) have evolved into a major player for machine learning
During the past decade, deep neural networks (DNNs) have evolved into a major player for machine learning
We investigate the effect of the regularization parameter λ on the number of active knots in the learned spline activation functions and the performance of the neural network

Summary

Introduction

Deep neural networks (DNNs) have evolved into a major player for machine learning. They have been found to outperform the traditional techniques of statistical learning [1] (e.g., kernel methods, support-vector machines, random forests) in many real-world applications that include image classification [2], speech recognition [3], image segmentation [4], and medical imaging [5]. The basic principle behind DNNs is to construct powerful learning architectures via the composition of simple basic modules; that is, linear (or affine) transformations and pointwise nonlinearities [6]. The qualifier “deep” refers to the depth (or number of layers) of such a composition which is typically much larger than one. A given layer of the network is characterized by

Objectives

Methods

Results

Conclusion