Trend-Smooth: Accelerate Asynchronous SGD by Smoothing Parameters Using Parameter Trends

Guoxin Cui,Xueqi Cheng,Yixing Fan,Jiafeng Guo,Yanyan Lan

doi:10.1109/access.2019.2949611

Abstract

Stochastic gradient descent(SGD) is the fundamental sequential method in training large scale machine learning models. To accelerate the training process, researchers proposed to use the asynchronous stochastic gradient descent (A-SGD) method in model learning. However, due to the stale information when updating parameters, A-SGD converges more slowly than SGD in the same iteration number. Moreover, A-SGD often converges to a high loss value and results in lower model accuracy. In this paper, we propose a novel algorithm called Trend-Smooth which can be adapted to the asynchronous parallel environment to overcome the above problems. Specifically, Trend-Smooth makes use of the parameter trend during the training process to shrink the learning rate of some dimensions where the gradients' directions are opposite to the trends of parameters. Experiments on MNIST and CIFAR-10 datasets confirm that Trend-Smooth can accelerate the convergence speed in asynchronous training process. The test accuracy that Trend-Smooth achieves is shown to be higher than other asynchronous parallel baseline methods, and is very close to the SGD method. Moreover, Trend-Smooth can also be combined with other adaptive learning rate methods(like Momentum, RMSProp and Adam) in the asynchronous parallel environment to promote their performance.

Highlights

Stochastic gradient descent(SGD) is the most widely used and fundamental sequential method in training machine learning models recently
We find that the parameter curves of the asynchronous stochastic gradient descent (A-SGD) method have some trends, which is similar to that of the SGD method found in previous work [8]
Our work is based on the analysis of the stale gradient impact on the parameter curves in the A-SGD method

Summary

INTRODUCTION

Stochastic gradient descent(SGD) is the most widely used and fundamental sequential method in training machine learning models recently In each iteration, it uses a small subset of the whole dataset to compute gradients and use them to update model parameters. Trend-Smooth can be combined with other adaptive learning rate methods like these mentioned above(e.g. Momentum, RMSProp and Adam) to promote their performances in asynchronous parallel environment. We conduct experiments to verify that Trend-Smooth can speed up the training convergence, and achieve higher test accuracy (very close to the SGD method) compared to other asynchronous parallel methods. We use DC-ASGD as one of our baseline methods

PARAMETER SERVER

OUR APPROACH

THE WHOLE LEARNING PROCESS

CONCLUSION

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 5	License type: CC BY 4.0

R Discovery Prime

Trend-Smooth: Accelerate Asynchronous SGD by Smoothing Parameters Using Parameter Trends

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Asynchronous Variance Reduced SGD with Momentum Tuning in Edge Computing
Anumita Srivastava ... Hamid R Arabnia
-
Anumita Srivastava, et. al.Anumita Srivastava ... Hamid R Arabnia
01 Dec 2018
01 Dec 2018

An Efficient, Distributed Stochastic Gradient Descent Algorithm for Deep-Learning Applications
Guojing Cong ... Minwei Feng
-
Guojing Cong, et. al.Guojing Cong ... Minwei Feng
01 Aug 2017
01 Aug 2017

Averaged Stochastic Optimization for Medical Image Registration Based on Variance Reduction
Wei Sun ... Dirk H J Poot
-
Wei Sun, et. al.Wei Sun ... Dirk H J Poot
01 Jan 2018
01 Jan 2018

Optimization of Parallel Stochastic Gradient Descent on Sunway TaihuLight
Minggui Wang ... Dong Xu
-
Minggui Wang, et. al.Minggui Wang ... Dong Xu
01 Oct 2020
01 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Trend-Smooth: Accelerate Asynchronous SGD by Smoothing Parameters Using Parameter Trends

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access