Abstract
The information needed for a certain machine application can be often obtained from a subset of the available features. Strongly relevant features should be retained to achieve desirable model performance. This research focuses on selecting relevant independent features for Support Vector Machine (SVM) classifiers in a cost-sensitive manner. A review of recent literature about feature selection for SVM revealed a lack of linear programming embedded SVM feature selection models. Most reviewed models were mixed-integer linear or nonlinear. Further, the review highlighted a lack of cost-sensitive SVM feature selection models. Cost sensitivity improves the generalization of SVM feature selection models, making them applicable to various cost-of-error situations. It also helps with handling imbalanced data. This research introduces an SVM-based filter method named Knapsack Max-Margin Feature Selection (KS-MMFS), which is a proposed linearization of the quadratic Max-Margin Feature Selection (MMFS) model. MMFS provides explicit estimates of feature importance in terms of relevance and redundancy. KS-MMFS was then used to develop a linear cost-sensitive SVM embedded feature selection model. The proposed model was tested on a group of 11 benchmark datasets and compared to relevant models from the literature. The results and analysis showed that different cost sensitivity (i.e., sensitivity–specificity tradeoff) requirements influence the features selected. The analysis demonstrated the competitive performance of the proposed model compared with relevant models. The model achieved an average improvement of 31.8% on classification performance with a 22.4% average reduction in solution time. The results and analysis in this research demonstrated the competitive performance of the proposed model as an efficient cost-sensitive embedded feature selection method.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have