An Efficient v-Minimum Absolute Deviation Distribution Regression Machine

Yan Wang,Wei Pang,Xuping Xie,Yao Wang,Yingying Song,Lan Huang,George M Coghill

doi:10.1109/access.2020.2992703

Abstract

Support Vector Regression (SVR) and its variants are widely used regression algorithms, and they have demonstrated high generalization ability. This research proposes a new SVR-based regressor: v-minimum absolute deviation distribution regression (v-MADR) machine. Instead of merely minimizing structural risk, as with v-SVR, v-MADR aims to achieve better generalization performance by minimizing both the absolute regression deviation mean and the absolute regression deviation variance, which takes into account the positive and negative values of the regression deviation of sample points. For optimization, we propose a dual coordinate descent (DCD) algorithm for small sample problems, and we also propose an averaged stochastic gradient descent (ASGD) algorithm for large-scale problems. Furthermore, we study the statistical property of v-MADR that leads to a bound on the expectation of error. The experimental results on both artificial and real datasets indicate that our v-MADR has significant improvement in generalization performance with less training time compared to the widely used v-SVR, LS-SVR, e-TSVR, and linear e-SVR. Finally, we open source the code of v-MADR at https://github.com/AsunaYY/v-MADR for wider dissemination.

Highlights

Support vector regression (SVR) [1]–[3] has been widely used in machine learning, since it can achieve better structural risk minimization
We empirically evaluate the performance of our v-minimum absolute deviation distribution regression (v-MADR) compared with other SVRbased algorithms, including v-Support Vector Regression (SVR), LS-SVR, ε-TSVR, and linear ε-support vector regression (ε-SVR) on several datasets, including two artificial datasets, eight medium-scale datasets, and six large-scale datasets
R2014a on a PC with a 2.00GHz CPU and 32 GB memory. ε-SVR is solved by LIBSVM [49]; ε-SVR is solved by LIBLINEAR [50]; LS-SVR is solved by LSSVMlab [51]; and ε-TSVR is solved by the SOR technique [52], [53]

Summary

INTRODUCTION

Support vector regression (SVR) [1]–[3] has been widely used in machine learning, since it can achieve better structural risk minimization. Wang et al.: Efficient v-MADR Machine support vector regression [18], and lagrangian twin support vector regression (LTSVR) [19] These algorithms demonstrate good ability to capture data structure and boundary information. Inspired by the idea of LDM, Liu et al proposed a minimum deviation distribution regression (MDR) [24], which introduced the statistics of regression deviation into ε-SVR. Considering the above advances in SVR, in this research, we introduce the statistical information into v-SVR and propose an v-minimum absolute deviation distribution regression (v-MADR). Inspired by recent theoretical results [20]–[24], v-MADR simultaneously minimizes the absolute regression deviation mean and the absolute regression deviation variance based on the v-SVR strategy, thereby greatly improving the generalization performance [21], [23].

BACKGROUND

RECENT PROGRESS IN SV THEORY

FORMULATION OF v-MADR

EXPERIMENTAL RESULTS

ARTIFICIAL DATASETS

CONCLUSION