Maximum correntropy criterion (MCC) has been an important method in machine learning and signal processing communities since it was successfully applied in various non-Gaussian noise scenarios. In comparison with the classical least squares method (LS), which takes only the second-order moment of models into consideration and belongs to the convex optimization problem, MCC captures the high-order information of models that play crucial roles in robust learning, which is usually accompanied by solving the non-convexity optimization problems. As we know, the theoretical research on convex optimizations has made significant achievements, while theoretical understandings of non-convex optimization are still far from mature. Motivated by the popularity of the stochastic gradient descent (SGD) for solving nonconvex problems, this paper considers SGD applied to the kernel version of MCC, which has been shown to be robust to outliers and non-Gaussian data in nonlinear structure models. As the existing theoretical results for the SGD algorithm applied to the kernel MCC are not well established, we present the rigorous analysis for the convergence behaviors and provide explicit convergence rates under some standard conditions. Our work can fill the gap between optimization process and convergence during the iterations: the iterates need to converge to the global minimizer while the obtained estimator cannot ensure the global optimality in the learning process.
Read full abstract