Abstract

BackgroundClassification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution.ResultsFeature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations.ConclusionsThe proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.

Highlights

  • Classification and variable selection play an important role in knowledge discovery in highdimensional data

  • Simulation study Simulation design A comprehensive simulation study evaluating the performance of four feature selection Support Vector Machine (SVM) classifiers, L1 SVM, Smoothly clipped absolute deviation penalty (SCAD) SVM, Elastic Net SVM and Elastic SCAD SVM, was performed

  • The SCAD SVM reduced the number of features from 4919 to 476, L1 SVM selected 1573 features, Elastic Net 109 features, and the Elastic SCAD had 459 features in the classifier

Read more

Summary

Introduction

Classification and variable selection play an important role in knowledge discovery in highdimensional data. Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and a number of feature selection procedures have been developed. Classification and prediction methods play important roles in data analysis for a wide range of applications. Classification is performed on high-dimensional data, where the number of features is much larger compared to the number of samples (’large p small n’ problem) [1]. Feature selection methods aim at finding the features most relevant for prediction. The objective of feature selection is three-fold: (i) improving the prediction performance of the predictors, (ii) providing faster and more cost-effective predictors, and (iii) gaining a deeper insight into the underlying processes that generated the data

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.