Improved Linear Convergence of Training CNNs With Generalizability Guarantees: A One-Hidden-Layer Case.

Shuai Zhang,Meng Wang,Sijia Liu,Pin-Yu Chen,Jinjun Xiong

doi:10.1109/tnnls.2020.3007399

Abstract

We analyze the learning problem of one-hidden-layer nonoverlapping convolutional neural networks with the rectified linear unit (ReLU) activation function from the perspective of model estimation. The training outputs are assumed to be generated by the neural network with the unknown ground-truth parameters plus some additive noise, and the objective is to estimate the model parameters by minimizing a nonconvex squared loss function of the training data. Assuming that the training set contains a finite number of samples generated from the Gaussian distribution, we prove that the accelerated gradient descent (GD) algorithm with a proper initialization converges to the ground-truth parameters (up to the noise level) with a linear rate even though the learning problem is nonconvex. Moreover, the convergence rate is proved to be faster than the vanilla GD. The initialization can be achieved by the existing tensor initialization method. In contrast to the existing works that assume an infinite number of samples, we theoretically establish the sample complexity of the required number of training samples. Although the neural network considered here is not deep, this is the first work to show that accelerated GD algorithms can find the global optimizer of the nonconvex learning problem of neural networks. This is also the first work that characterizes the sample complexity of gradient-based methods in learning convolutional neural networks with the nonsmooth ReLU activation function. This work also provides the tightest bound so far of the estimation error with respect to the output noise.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improved Linear Convergence of Training CNNs With Generalizability Guarantees: A One-Hidden-Layer Case.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on neural networks and learning systems

Lead the way for us

Journal: IEEE transactions on neural networks and learning systems	Publication Date: Jul 29, 2020
Citations: 6

Similar Papers

Gradient Descent for Non-convex Problems in Modern Machine Learning

-

27 Jun 2019
27 Jun 2019

Guaranteed Convergence of Training Convolutional Neural Networks via Accelerated Gradient Descent
Shuai Zhang ... Jinjun Xiong
-
Shuai Zhang, et. al.Shuai Zhang ... Jinjun Xiong
01 Mar 2020
01 Mar 2020

Elastic exponential linear units for convolutional neural networks
Daeho Kim ... Jaeil Kim
Neurocomputing | VOL. 406
Daeho Kim, et. al.Daeho Kim ... Jaeil Kim
26 Mar 2020
Neurocomputing | VOL. 406

Quantum ReLU activation for Convolutional Neural Networks to improve diagnosis of Parkinson’s disease and COVID-19
Luca Parisi ... Felician Campean
Expert Systems with Applications | VOL. 187
Luca Parisi, et. al.Luca Parisi ... Felician Campean
14 Sep 2021
Expert Systems with Applications | VOL. 187

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved Linear Convergence of Training CNNs With Generalizability Guarantees: A One-Hidden-Layer Case.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on neural networks and learning systems