Training Neural Networks by Lifted Proximal Operator Machines.

Jia Li,Zhouchen Lin,Mingqing Xiao,Chao Xu,Yue Dai,Cong Fang

doi:10.1109/tpami.2020.3048430

Abstract

We present the lifted proximal operator machine (LPOM) to train fully-connected feed-forward neural networks. LPOM represents the activation function as an equivalent proximal operator and adds the proximal operators to the objective function of a network as penalties. LPOM is block multi-convex in all layer-wise weights and activations. This allows us to develop a new block coordinate descent (BCD) method with convergence guarantee to solve it. Due to the novel formulation and solving method, LPOM only uses the activation function itself and does not require any gradient steps. Thus it avoids the gradient vanishing or exploding issues, which are often blamed in gradient-based methods. Also, it can handle various non-decreasing Lipschitz continuous activation functions. Additionally, LPOM is almost as memory-efficient as stochastic gradient descent and its parameter tuning is relatively easy. We further implement and analyze the parallel solution of LPOM. We first propose a general asynchronous-parallel BCD method with convergence guarantee. Then we use it to solve LPOM, resulting in asynchronous-parallel LPOM. For faster speed, we develop the synchronous-parallel LPOM. We validate the advantages of LPOM on various network architectures and datasets. We also apply synchronous-parallel LPOM to autoencoder training and demonstrate its fast convergence and superior performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Training Neural Networks by Lifted Proximal Operator Machines.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Dec 31, 2020
Citations: 10

Similar Papers

Block Coordinate Descent Methods for Semidefinite Programming
Zaiwen Wen ... Donald Goldfarb
-
Zaiwen Wen, et. al.Zaiwen Wen ... Donald Goldfarb
26 Sep 2011
26 Sep 2011

Lifted Proximal Operator Machines
Jia Li ... Zhouchen Lin
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33
Jia Li, et. al.Jia Li ... Zhouchen Lin
17 Jul 2019
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33

Block layer decomposition schemes for training deep neural networks
Laura Palagi ... Ruggiero Seccia
Journal of Global Optimization | VOL. 77
Laura Palagi, et. al.Laura Palagi ... Ruggiero Seccia
15 Nov 2019
Journal of Global Optimization | VOL. 77

Power Allocation for Energy-Efficient Optimization in IoT-Based Distributed Antenna System With Imperfect Channel State Information
Weiye Xu ... Xiangbin Yu
IEEE Internet of Things Journal | VOL. 9
Weiye Xu, et. al.Weiye Xu ... Xiangbin Yu
15 Oct 2022
IEEE Internet of Things Journal | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Training Neural Networks by Lifted Proximal Operator Machines.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence