Iterative Reweighted Noninteger Norm Regularizing SVM for Gene Expression Data Classification

Jianwei Liu,Xionglin Luo,Shuang Cheng Li

doi:10.1155/2013/768404

Jianwei Liu, Xionglin Luo + Show 1 more

Open Access

https://doi.org/10.1155/2013/768404

Copy DOI

Abstract

Support vector machine is an effective classification and regression method that uses machine learning theory to maximize the predictive accuracy while avoiding overfitting of data. L2 regularization has been commonly used. If the training dataset contains many noise variables, L1 regularization SVM will provide a better performance. However, both L1 and L2 are not the optimal regularization method when handing a large number of redundant values and only a small amount of data points is useful for machine learning. We have therefore proposed an adaptive learning algorithm using the iterative reweighted p-norm regularization support vector machine for 0 < p ≤ 2. A simulated data set was created to evaluate the algorithm. It was shown that a p value of 0.8 was able to produce better feature selection rate with high accuracy. Four cancer data sets from public data banks were used also for the evaluation. All four evaluations show that the new adaptive algorithm was able to achieve the optimal prediction error using a p value less than L1 norm. Moreover, we observe that the proposed Lp penalty is more robust to noise variables than the L1 and L2 penalties.

Highlights

Support vector machine (SVM) has been shown to be an effective classification and regression method that uses machine learning theory to maximize the predictive accuracy while avoiding overfitting of data [1]
The results to follow will show that the IRWP-SVM is able to remove irrelevant variables and identify relevant variables when the dimension of the samples is typically larger than the number of training points
We have presented an adaptive learning algorithm using iterative reweighted p-norm regularization support vector machine for 0 < p ≤ 2

Summary

Introduction

Support vector machine (SVM) has been shown to be an effective classification and regression method that uses machine learning theory to maximize the predictive accuracy while avoiding overfitting of data [1]. L2 regularization method is usually used in the standard SVM. It works well especially when the dataset does not contain too much noise. If the training data set contains many noise variables, L1 regularization SVM will provide a better performance. Since the penalty functions are predetermined for data training, SVM algorithms sometimes work very well but other times are unsatisfactory. The training data set contains a large number of redundant values and only a small amount of data points is useful for machine learning. This is more common in bioinformatics applications

Methods

Findings

Discussion

Conclusion