Improved high-dimensional regression models with matrix approximations applied to the comparative case studies with support vector machines

Mahdi Roozbeh,Saman Babaie-Kafaki,Zohre Aminifard

doi:10.1080/10556788.2021.2022144

Abstract

Nowadays, high-dimensional data appear in many practical applications such as biosciences. In the regression analysis literature, the well-known ordinary least-squares estimation may be misleading when the full ranking of the design matrix is missed. As a popular issue, outliers may corrupt normal distribution of the residuals. Thus, since not being sensitive to the outlying data points, robust estimators are frequently applied in confrontation with the issue. Ill-conditioning in high-dimensional data is another common problem in modern regression analysis under which applying the least-squares estimator is hardly possible. So, it is necessary to deal with estimation methods to tackle these problems. As known, a successful approach for high-dimension cases is the penalized scheme with the aim of obtaining a subset of effective explanatory variables that predict the response as the best, while setting the other parameters to zero. Here, we develop several penalized mixed-integer nonlinear programming models to be used in high-dimension regression analysis. The given matrix approximations have simple structures, decreasing computational cost of the models. Moreover, the models are effectively solvable by metaheuristic algorithms. Numerical tests are made to shed light on performance of the proposed methods on simulated and real world high-dimensional data sets.

Full Text