Abstract
The Support Vector Machine (SVM) is a popular classification paradigm in machine learning and has achieved great success in real applications. However, the standard SVM can not select variables automatically and therefore its solution typically utilizes all the input variables without discrimination. This makes it difficult to identify important predictor variables, which is often one of the primary goals in data analysis. In this paper, we propose two novel types of regularization in the context of the multicategory SVM (MSVM) for simultaneous classification and variable selection. The MSVM generally requires estimation of multiple discriminating functions and applies the argmax rule for prediction. For each individual variable, we propose to characterize its importance by the supnorm of its coefficient vector associated with different functions, and then minimize the MSVM hinge loss function subject to a penalty on the sum of supnorms. To further improve the supnorm penalty, we propose the adaptive regularization, which allows different weights imposed on different variables according to their relative importance. Both types of regularization automate variable selection in the process of building classifiers, and lead to sparse multi-classifiers with enhanced interpretability and improved accuracy, especially for high dimensional low sample size data. One big advantage of the supnorm penalty is its easy implementation via standard linear programming. Several simulated examples and one real gene data analysis demonstrate the outstanding performance of the adaptive supnorm penalty in various data settings.
Highlights
While the Support Vector Machine (SVM) outperforms many other methods in terms of classification accuracy in numerous real problems, the implicit nature of its solution makes it less attractive in providing insight into the predictive ability of individual variables
Variable selection becomes more complex than the binary case, since the multicategory SVM (MSVM) requires estimation of multiple discriminating functions, among which each function has its own subset of important predictors
In contrast to the L1 MSVM, which imposes a penalty on the sum of absolute values of all coefficients, we penalize the sup-norm of the coefficients associated with each variable
Summary
The sup-norm penalty shrinks sum of two maximums corresponding to two variables This helps to lead to more parsimonious models. In contrast to the L1 penalty, the sup-norm utilizes the group information of the decision function vector and the sup-norm MSVM can deliver better variable selection. For three-class problems, we show that the L1 MSVM and the new proposed sup-norm MSVM give identical solutions after adjusting the tuning parameters, which is due to the sum-to-zero constraints on w(j)’s. We use leave-one-out cross validation of the misclassification rate to select λ
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.