Block layer decomposition schemes for training deep neural networks

Laura Palagi,Ruggiero Seccia

doi:10.1007/s10898-019-00856-0

Abstract

Deep Feedforward Neural Networks' (DFNNs) weights estimation relies on the solution of a very large nonconvex optimization problem that may have many local (no global) minimizers, saddle points and large plateaus. As a consequence, optimization algorithms can be attracted toward local minimizers which can lead to bad solutions or can slow down the optimization process. Furthermore, the time needed to find good solutions to the training problem depends on both the number of samples and the number of variables. In this work, we show how Block Coordinate Descent (BCD) methods can be applied to improve performance of state-of-the-art algorithms by avoiding bad stationary points and flat regions. We first describe a batch BCD method ables to effectively tackle the network's depth and then we further extend the algorithm proposing a \textit{minibatch} BCD framework able to scale with respect to both the number of variables and the number of samples by embedding a BCD approach into a minibatch framework. By extensive numerical results on standard datasets for several architecture networks, we show how the application of BCD methods to the training phase of DFNNs permits to outperform standard batch and minibatch algorithms leading to an improvement on both the training phase and the generalization performance of the networks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Block layer decomposition schemes for training deep neural networks

Abstract

Talk to us

Similar Papers

More From: Journal of Global Optimization

Lead the way for us

Journal: Journal of Global Optimization	Publication Date: Nov 15, 2019
Citations: 3

Similar Papers

Block Coordinate Descent Methods for Semidefinite Programming
Zaiwen Wen ... Donald Goldfarb
-
Zaiwen Wen, et. al.Zaiwen Wen ... Donald Goldfarb
26 Sep 2011
26 Sep 2011

On the flexibility of block coordinate descent for large-scale optimization
Xiangfeng Wang ... Hongyuan Zha
Neurocomputing | VOL. 272
Xiangfeng Wang, et. al.Xiangfeng Wang ... Hongyuan Zha
22 Jul 2017
Neurocomputing | VOL. 272

Sparse Approximation via Penalty Decomposition Methods
Zhaosong Lu ... Yong Zhang
SIAM Journal on Optimization | VOL. 23
Zhaosong Lu, et. al.Zhaosong Lu ... Yong Zhang
01 Jan 2013
SIAM Journal on Optimization | VOL. 23

-minimization methods for image restoration problems based on wavelet frames
Jian Lu ... Zhaosong Lu
Inverse Problems | VOL. 35
Jian Lu, et. al.Jian Lu ... Zhaosong Lu
29 May 2019
Inverse Problems | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Block layer decomposition schemes for training deep neural networks

Abstract

Talk to us

Similar Papers

More From: Journal of Global Optimization