A Review of Classical First Order Optimization Methods in Machine Learning

Zhang Menghan

doi:10.1007/978-3-030-00214-5_145

Abstract

In the development of machine learning, the first-order classical gradient optimization method plays a crucial role in optimizing the field, and many efficient algorithms are improved on this basis. Under the condition of the regularization and loss function as a whole, this paper mainly introduces three black-box methods that use first-order step information: Projection Sub-gradient Descent Method PGD; Mirror Descent Method MD; Dual Averaging Method DA. This paper summarizes the development course and the improvement method, analyzing and comparing the convergence rate and bound of different conditions. Since the solution of almost every model problem is confronted with large-scale data, considering the special meaning of the regularization and loss function is given to the machine learning model. In order to ensure the structure of the machine learning problem solution, we summed up the conditions of solving the regularization problem and some methods of structural optimization and improvement methods, refining the stochastic optimization methods in proving some of the math skills and the part of the study classified and summarized, analyzing the significance of the era for the current big data. In addition, because the noise interference gradient unbiased hypothesis is often not established in the real world, we also put forward some problems worthy of further study under the condition that the gradient information is biased.

Full Text