Abstract

Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, which forms the bedrock of modern machine learning. In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster to a limiting error. To do so, rather than fixing the mini-batch and the step-size at the outset, we propose a strategy to allow parameters evolving adaptively. Specifically, the batch-size is set to be a piecewise-constant increasing sequence where the increase occurs when a suitable error criterion is satisfied. Moreover, the step-size is selected as that which yields the fastest convergence. The overall algorithm, two scale adaptive (TSA) scheme, is developed for both convex and non-convex problems. It inherits the exact convergence and more importantly, the optimal error decreasing rate and an overall computation reduction are achieved. Furthermore, we extended the TSA method to the generalized adaptive batching framework, which is a generic methodology modular to any stochastic algorithms pursuing a trade-off between convergence rates and stochastic variance. We evaluate the TSA method on the image classification problem on MNIST and CIFAR-10 datasets compared with standard SGD methods and existing adaptive batch-size methods, to corroborate theoretical findings.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call