Abstract—Optimization algorithms determine the best model parameters that tend to offer prediction accuracy in complex machine learning (ML) problems in big data. In fact, they are very crucial while handling large-scale data because old ways of handling datasets will be computationally very inefficient. This paper provides an in-depth comparison of the two most widely used optimization techniques in machine learning: SGD, and BGD. Our simulated results indicate that though the SGD technique indicates very fast initial convergence, its efficiency eventually tends to degrade with the increasing iteration of the algorithm. On the other hand, the BGD approach might take slow initiation but is relatively consistent in a long run. We then further probe how the variations in learning rate affect its performance in the case of both the methods. Our analysis shows that adaptive learning rates drastically accelerate convergence. Finally, we show that the computational efficiency of the SGD method makes it a better choice since gradients can be computed on a per sample basis, which makes the method better for scaling. Keywords—Optimization Algorithms, Stochastic gradient de- scent, Batch gradient descent, Machine learning, Large-scale data, and Adaptive learning rate.
Read full abstract