Large-scale robust regression with truncated loss via majorization-minimization algorithm

Ling-Wei Huang,Yuan-Hai Shao,Xiao-Jing Lv,Chun-Na Li

doi:10.1016/j.ejor.2024.04.028

Abstract

The utilization of regression methods employing truncated loss functions is widely praised for its robustness in handling outliers and representing the solution in the sparse form of the samples. However, due to the non-convexity of the truncated loss, the commonly used algorithms such as difference of convex algorithm (DCA) fail to maintain sparsity when dealing with non-convex loss functions, and adapting DCA for efficient optimization also incurs additional development costs. To address these challenges, we propose a novel approach called truncated loss regression via majorization-minimization algorithm (TLRM). TLRM employs a surrogate function to approximate the original truncated loss regression and offers several desirable properties: (i) Eliminating outliers before the training process and encapsulating general convex loss regression within its structure as iterative subproblems, (ii) Solving the convex loss problem iteratively thereby facilitating the use of a well-established toolbox for convex optimization. (iii) Converging to a truncated loss regression and providing a solution with sample sparsity. Extensive experiments demonstrate that TLRM achieves superior sparsity without sacrificing robustness, and it can be several tens of thousands of times faster than traditional DCA on large-scale problems. Moreover, TLRM is also applicable to datasets with millions of samples, making it a practical choice for real-world scenarios. The codebase for methods with truncated loss functions is accessible at https://i-do-lab.github.io/optimal-group.org/Resources/Code/TLRM.html.

Full Text