An adiabatic method to train binarized artificial neural networks

Yuansheng Zhao,Jiang Xiao

doi:10.1038/s41598-021-99191-2

Abstract

An artificial neural network consists of neurons and synapses. Neuron gives output based on its input according to non-linear activation functions such as the Sigmoid, Hyperbolic Tangent (Tanh), or Rectified Linear Unit (ReLU) functions, etc.. Synapses connect the neuron outputs to their inputs with tunable real-valued weights. The most resource-demanding operations in realizing such neural networks are the multiplication and accumulate (MAC) operations that compute the dot product between real-valued outputs from neurons and the synapses weights. The efficiency of neural networks can be drastically enhanced if the neuron outputs and/or the weights can be trained to take binary values pm 1 only, for which the MAC can be replaced by the simple XNOR operations. In this paper, we demonstrate an adiabatic training method that can binarize the fully-connected neural networks and the convolutional neural networks without modifying the network structure and size. This adiabatic training method only requires very minimal changes in training algorithms, and is tested in the following four tasks: the recognition of hand-writing numbers using a usual fully-connected network, the cat-dog recognition and the audio recognition using convolutional neural networks, the image recognition with 10 classes (CIFAR-10) using ResNet-20 and VGG-Small networks. In all tasks, the performance of the binary neural networks trained by the adiabatic method are almost identical to the networks trained using the conventional ReLU or Sigmoid activations with real-valued activations and weights. This adiabatic method can be easily applied to binarize different types of networks, and will increase the computational efficiency considerably and greatly simplify the deployment of neural networks.

Highlights

An artificial neural network consists of neurons and synapses
The parametrized function used in Eq (2) has some similarity with the Differentiable Soft Quantization (DSQ) method proposed by Gong et al.[13], but the tunable width used in this work is viewed as a global non-trainable, time-evolving parameter; Proxquant by Bai et al.[16] used a time-evolving regularizer to binarize weight, but is unable to regularize activation in a similar manner
With the four different tasks demonstrated above, we showed that, by adiabatically varying the width of the activation and weight distribution towards zero, the existing neural networks can be trained to work with binarized Heaviside activated neurons and/or binarized weights

Summary

Introduction

An artificial neural network consists of neurons and synapses. Neuron gives output based on its input according to non-linear activation functions such as the Sigmoid, Hyperbolic Tangent (Tanh), or Rectified Linear Unit (ReLU) functions, etc. The most widely used neuron activation functions include the Sigmoid function sgmd (x) = 1/[1 + exp(−x)] , the Hyperbolic Tangent function tanh(x) , and the Rectified Linear Unit function ReLU (x) = max(x, 0) , etc..[6,7]. All of these activation functions require multiple bits for storage or processing.

Methods

Findings

Conclusion