Convergence of a Relaxed Variable Splitting Coarse Gradient Descent Method for Learning Sparse Weight Binarized Activation Neural Network

Thu Dinh,Jack Xin

doi:10.3389/fams.2020.00013

Abstract

Sparsification of neural networks is one of the effective complexity reduction methods to improve efficiency and generalizability. Binarized activation offers an additional computational saving for inference. Due to vanishing gradient issue in training networks with binarized activation, coarse gradient (a.k.a. straight through estimator) is adopted in practice. In this paper, we study the problem of coarse gradient descent (CGD) learning of a one hidden layer convolutional neural network (CNN) with binarized activation function and sparse weights. It is known that when the input data is Gaussian distributed, no-overlap one hidden layer CNN with ReLU activation and general weight can be learned by GD in polynomial time at high probability in regression problems with ground truth. We propose a relaxed variable splitting method integrating thresholding and coarse gradient descent. The sparsity in network weight is realized through thresholding during the CGD training process. We prove that under threshholding of l_1, l_0, and transformed-l_1 penalties, no-overlap binary activation CNN can be learned with high probability, and the iterative weights converge to a global limit which is a transformation of the true weight under a novel sparsifying operation. We found explicit error estimates of sparse weights from the true weights.

Highlights

Deep neural networks (DNN) have achieved state-of-the-art performance on many machine learning tasks such as speech recognition [1], computer vision [2], and natural language processing [3]
We propose a Relaxed Variable Splitting (RVS) approach combining thresholding and coarse gradient descent (CGD) for minimizing the following augmented objective function β Lβ (u, w) = f (w) + λ P(u) + 2 w−u 2 for a positive parameter β
We shall prove that our algorithm (RVSCGD), alternately minimizing u and w, converges for l0, l1, and Tl1 penalties to a global limit (w, u ) with high probability

Summary

INTRODUCTION

Deep neural networks (DNN) have achieved state-of-the-art performance on many machine learning tasks such as speech recognition [1], computer vision [2], and natural language processing [3]. Several works [6, 7] focused on the geometric properties of loss functions, which is made possible by assuming that the input data distribution is Gaussian They showed that SGD with random or zero initialization is able to train a no-overlap neural network in polynomial time. A surrogate l0 regularization approach based on a continuous relaxation of Bernoulli random variables in the distribution sense is introduced with encouraging results on small size image data sets [13] This motivated our work here to study deterministic regularization of l0 via its Moreau envelope and related l1 penalties in a one hidden layer convolutional neural network model [7]. As pointed out in Louizos et al [13], it is beneficial to attain sparsity during the optimization (training) process

Contribution

Outline

RELATED WORK

The One-Layer Non-overlap Network

The Relaxed Variable Splitting Coarse

Comparison With ADMM

MAIN RESULTS

PROOF OF MAIN RESULTS

Proof of Lemma 4

Proof of Lemma 5

Proof of Lemma 7

Proof of Corollary

CONCLUSION

NUMERICAL EXPERIMENTS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Applied Mathematics and Statistics	Publication Date: May 6, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Convergence of a Relaxed Variable Splitting Coarse Gradient Descent Method for Learning Sparse Weight Binarized Activation Neural Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Applied Mathematics and Statistics

Lead the way for us

Similar Papers

Evaluation of visualization performance of CNN models using driver model
Chenkai Zhang ... Takahiro Wada
-
Chenkai Zhang, et. al.Chenkai Zhang ... Takahiro Wada
11 Jan 2021
11 Jan 2021

A Ternary Weight Binary Input Convolutional Neural Network: Realization on the Embedded Processor
Haruyoshi Yonekawa ... Hiroki Nakahara
-
Haruyoshi Yonekawa, et. al.Haruyoshi Yonekawa ... Hiroki Nakahara
01 May 2018
01 May 2018

FPGA-based neural network accelerators for millimeter-wave radio-over-fiber systems.
Jeonghun Lee ... Jiayuan He
Optics Express | VOL. 28
Jeonghun Lee, et. al.Jeonghun Lee ... Jiayuan He
20 Apr 2020
Optics Express | VOL. 28

Binary convolutional neural network on RRAM
Tianqi Tang ... Lixue Xia
-
Tianqi Tang, et. al.Tianqi Tang ... Lixue Xia
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Convergence of a Relaxed Variable Splitting Coarse Gradient Descent Method for Learning Sparse Weight Binarized Activation Neural Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Applied Mathematics and Statistics