Abstract
Mini-batch sub-sampling (MBSS) in neural network training is unavoidable due to growing data demands, memory-limited computational resources such as graphical processing units, and the dynamics of on-line learning. This study distinguishes between static MBSS and dynamic MBSS. In static MBSS, mini-batches are intermittently fixed during training, resulting in smooth but biased loss functions. During dynamic MBSS, mini-batches are resampled at every loss evaluation, resulting in sampling induced discontinuities by trading sampling bias for sampling variance. This renders classical minimization strategies ineffective in dynamic MBSS losses, as these may locate spurious sampling induced minima, while critical points may not exist. This paper re-evaluates the information used to define optima in stochastic loss functions of neural networks by defining the solution to a stochastic optimization problem as the stochastic non-negative associated gradient projection point (SNN-GPP). We demonstrate that SNN-GPPs offer a more robust description of full-batch optima than minimizers and critical points. An empirical investigation compares local minima to SNN-GPPs for the potential training of a simple neural network training problem with different activation functions. Since SNN-GPPs better approximate the location of true optima, we conclude that line searches locating SNN-GPPs can contribute significantly to automating neural network training.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.