An empirical study into finding optima in stochastic optimization of neural networks

Dominic Kafka,Daniel N Wilke

doi:10.1016/j.ins.2021.01.005

Abstract

Mini-batch sub-sampling (MBSS) in neural network training is unavoidable due to growing data demands, memory-limited computational resources such as graphical processing units, and the dynamics of on-line learning. This study distinguishes between static MBSS and dynamic MBSS. In static MBSS, mini-batches are intermittently fixed during training, resulting in smooth but biased loss functions. During dynamic MBSS, mini-batches are resampled at every loss evaluation, resulting in sampling induced discontinuities by trading sampling bias for sampling variance. This renders classical minimization strategies ineffective in dynamic MBSS losses, as these may locate spurious sampling induced minima, while critical points may not exist. This paper re-evaluates the information used to define optima in stochastic loss functions of neural networks by defining the solution to a stochastic optimization problem as the stochastic non-negative associated gradient projection point (SNN-GPP). We demonstrate that SNN-GPPs offer a more robust description of full-batch optima than minimizers and critical points. An empirical investigation compares local minima to SNN-GPPs for the potential training of a simple neural network training problem with different activation functions. Since SNN-GPPs better approximate the location of true optima, we conclude that line searches locating SNN-GPPs can contribute significantly to automating neural network training.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An empirical study into finding optima in stochastic optimization of neural networks

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Jan 26, 2021
Citations: 5

Similar Papers

Author response: Rapid, Reference-Free human genotype imputation with denoising autoencoders
Raquel Dias ... Salvatore Loguercio
-
Raquel Dias, et. al.Raquel Dias ... Salvatore Loguercio
23 Feb 2022
23 Feb 2022

Logarithm-approximate floating-point multiplier is applicable to power-efficient neural network training
Taiyu Cheng ... Masanori Hashimoto
Integration | VOL. 74
Taiyu Cheng, et. al.Taiyu Cheng ... Masanori Hashimoto
14 May 2020
Integration | VOL. 74

Optimization and acceleration of convolutional neural networks: A survey
Gousia Habib ... Shaima Qureshi
Journal of King Saud University - Computer and Information Sciences | VOL. 34
Gousia Habib, et. al.Gousia Habib ... Shaima Qureshi
20 Oct 2020
Journal of King Saud University - Computer and Information Sciences | VOL. 34

Diagnosis of dermatophytosis in cats using artificial neural networks
А.А Bushmina ... I.V Kireev
Veterinaria i kormlenie | VOL. -
А.А Bushmina, et. al.А.А Bushmina ... I.V Kireev
01 Feb 2023
Veterinaria i kormlenie | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An empirical study into finding optima in stochastic optimization of neural networks

Abstract

Talk to us

Similar Papers

More From: Information Sciences