Abstract

Artificial neural networks have proved to be useful in a host of demanding applications, therefore becoming increasingly important in science and engineering. Large-scale problems constitute a challenging task for training neural networks using the stochastic gradient descent method and variations, which are based on the random selection of mini-batches of training points at every iteration. The challenge lies on the mandatory use of diminishing search step sizes in order to retain mild error fluctuations throughout the training set preserving so the quality of the network’s generalization capability. Variance counterbalancing was recently proposed as a remedy for addressing the diminishing step sizes in neural network training using stochastic gradient methods. It is based on the concurrent minimization of the average mean squared error of the network along with the error variance over random sets of mini-batches. Also, it promotes the use of advanced optimization algorithms instead of the slowly convergent gradient descent. The present work aims at enriching our understanding of the original variance counterbalancing approach, as well as reformulating it as a multi-objective problem by taking advantage of its bi-objective nature. Experimental analysis reveals the performances of the studied approaches and their competitive edge over the established Adam method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call