Robustness of Adaptive Neural Network Optimization Under Training Noise

Subhajit Chaudhury,Toshihiko Yamasaki

doi:10.1109/access.2021.3062990

Subhajit Chaudhury, Toshihiko Yamasaki

Open Access

https://doi.org/10.1109/access.2021.3062990

Copy DOI

Abstract

Adaptive gradient methods such as adaptive moment estimation (Adam), RMSProp, and adaptive gradient (AdaGrad) use the temporal history of the gradient updates to improve the speed of convergence and reduce reliance on manual learning rate tuning, making them a popular choice for off-the-shelf Deep Neural Network (DNN) optimizers. In this article, we study the robustness of neural network optimizers in the presence of training perturbations. We show that popular adaptive optimization methods exhibit poor generalization while learning from noisy training data, compared to vanilla Stochastic Gradient Descent (SGD) and its variants, which manifest better implicit regularization properties. We construct an illustrative example of a family of two-class linearly separable toy-data such that models trained under noise using adaptive optimizers show only 52% test accuracy (random classifier), whereas SGD-based methods can achieve 100% test accuracy. We strengthen our hypothesis by empirical analysis using Convolutional Neural Networks (CNNs) on publicly available image datasets. For this purpose, our method trains neural network models with various optimizers on noisy training data, and we compute test accuracy on clean test data. Our results further highlight the robustness of SGD optimization against such noisy training data compared to its adaptive counterparts. Based on the results, our paper suggests a reconsideration of the extensive use of adaptive gradient methods for neural network optimization, especially when the training data is noisy.

Highlights

Deep Neural Network (DNN) [1] are high capacity models where the number of learnable parameters is very large given a finite number of training data
We analytically show that adaptive gradient methods completely fail to learn any patterns from the data and do not generalize to the clean test set
RELATED WORK We present some previous works on generalization in neural networks and their effects based on optimization strategies

Summary

INTRODUCTION

DNNs [1] are high capacity models where the number of learnable parameters is very large given a finite number of training data. We benchmark and compare the performance of adaptive and non-adaptive gradient methods in the presence of training perturbations which is a real problem that can occur in noisy acquisition devices. Benchmarking the robustness of optimizers against such training noise is an important task, which gives us a measure of whether the features selected by the optimizer conform to semantic information such as color and shape (for images) that enables better generalization to noiseless test samples. Based on our benchmarking results, in the presence of training noise, our results suggest using SGD-based optimizers with learning rate tuning instead of adaptive gradient methods for better generalization performance

RELATED WORK

COMMON FRAMEWORK FOR OPTIMIZATION

BENCHMARKING OPTIMIZERS ON HIGH DIMENSIONAL TRAINING DATA

4: Get sematic mismatch Smj

EXPERIMENTAL RESULTS

VIII. CONCLUSIONS

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2021
Citations: 30	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Robustness of Adaptive Neural Network Optimization Under Training Noise

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

New Gradient-Weighted Adaptive Gradient Methods With Dynamic Constraints
Dong Liang ... Fanfan Ma
IEEE access : practical innovations, open solutions | VOL. 8
Dong Liang, et. al.Dong Liang ... Fanfan Ma
01 Jan 2020
IEEE access : practical innovations, open solutions | VOL. 8

Optimization for Deep Learning: An Overview
Ruo-Yu Sun
Journal of the Operations Research Society of China | VOL. 8
Ruo-Yu SunRuo-Yu Sun
01 Jun 2020
Journal of the Operations Research Society of China | VOL. 8

ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization
Xunpeng Huang ... Lei Li
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence | VOL. 35
Xunpeng Huang, et. al.Xunpeng Huang ... Lei Li
18 May 2021
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence | VOL. 35

On Neural Network Activation Functions and Optimizers in Relation to Polynomial Regression
John Pomerat ... Rituparna Datta
-
John Pomerat, et. al.John Pomerat ... Rituparna Datta
01 Dec 2019
01 Dec 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robustness of Adaptive Neural Network Optimization Under Training Noise

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions