Abstract
An adaptive clamping method (SGD-MS) based on the radius of curvature is designed to alleviate the local optimal oscillation problem in deep neural network, which combines the radius of curvature of the objective function and the gradient descent of the optimizer. The radius of curvature is considered as the threshold to separate the momentum term or the future gradient moving average term adaptively. In addition, on this basis, we propose an accelerated version (SGD-MA), which further improves the convergence speed by using the method of aggregated momentum. Experimental results on several datasets show that the proposed methods effectively alleviate the local optimal oscillation problem and greatly improve the convergence speed and accuracy. A novel parameter updating algorithm is also provided in this paper for deep neural network.
Highlights
Deep neural network has made great achievements in the field of computer vision, such as face recognition [1] and object detection [2]; by deepening the network depth and enriching the datasets constantly, deep neural network has significantly improved the recognition accuracy
Computational Intelligence and Neuroscience the internal relationship between the curvature radius of the objective function and the gradient descent of the optimizer [13]. ree switching modes are introduced in our SGD-MS, including V model with momentum term only, D mode with the future gradient moving average term only, and S mode with both terms, which effectively alleviates the problem of local optimal oscillation caused by the accumulation of momentum term and the instability of the system caused by the large future gradient at the beginning of training
All super parameters in the SGD-MS optimizer are set to be the same as SGD-M, which proves that the SGD-MS optimizer is superior to SGD-M in precision and has faster convergence speed. e SGD-MA optimizer further improves the convergence speed of the SGD-MS optimizer. e computer configuration used in the experiment is Intel Core i7-9700u, 32 GB RAM, and GPU is GeForce RTX 2080Ti
Summary
Deep neural network has made great achievements in the field of computer vision, such as face recognition [1] and object detection [2]; by deepening the network depth and enriching the datasets constantly, deep neural network has significantly improved the recognition accuracy. An adaptive clamping optimization algorithm (SGD-MS) is proposed in this paper based on the curvature radius, which adds the moving average term of future gradient on the basis of SGDM. Ree switching modes are introduced in our SGD-MS, including V (velocity) model with momentum term only, D (difference) mode with the future gradient moving average term only, and S (sum) mode with both terms, which effectively alleviates the problem of local optimal oscillation caused by the accumulation of momentum term and the instability of the system caused by the large future gradient at the beginning of training. E proposed SGD-MS algorithm will switch adaptively in three modes to adapt to different training stages and effectively alleviate the problem of optimal oscillation. From the above formula, it is concluded that the NAG optimizer adds a correction factor to the momentum term on the basis of the SGD-M optimizer, which advances half a step forward in the process of parameter updating to achieve faster convergence. Ki and Kd are the adjustment coefficients for the integral term and the differential term, which is similar to the PID adjustment method and needs to be adjusted manually in the experiment. e design structure of the PID algorithm provides inspiration for our optimizer
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.